=Paper=
{{Paper
|id=Vol-1819/edudm2017-paper5
|storemode=property
|title=
|pdfUrl=https://ceur-ws.org/Vol-1819/edudm2017-paper5.pdf
|volume=Vol-1819
|authors=Seema Mahato,Ani Thomas
|dblpUrl=https://dblp.org/rec/conf/indiaSE/MahatoT17
}}
====
LEXICO-SEMANTIC ANALYSIS OF ESSAYS IN HINDI LANGUAGE Seema Mahato Dr. (Mrs.) Ani Thomas Research Scholar, Professor, Dept. of Computer Applications, Dr. C.V. Raman University, Bhilai Institute of Technology, BILASPUR (C.G.), INDIA, DURG (C.G.), INDIA, seema_mahato@yahoo.co.in tpthomas22@yahoo.com ABSTRACT rare whose primary focus is to provide automatic grading Large number of researchers consider essay as a tool to and evaluation of Hindi essays. Automated Essay Grading judge learning outcomes and intellectual capabilities and to (AEG) Systems, also known as Automated Writing assess the organized and integrated thoughts. Due to Evaluation Systems or Automated Essay Assessors. The increase in the number of universities students and distance aim of such system is to score student’s essays on a specific and ubiquitous e-learning approaches, the intention of using topic and give feedback to the student on deficiencies in Computer-based Assessment Systems has rise rapidly in a his/her essay. These systems lower down the burden of the decade. Manual grading of students' essays requires evaluator as unable to give personalized attention to the significant amount of time and hard work and also an student’s needs. Such systems provide human capability of expensive activity for educational institutions and need a reading and writing and also time-to-time feedback to the practical solution to this task. The automated essay grading writers/students which help them to improve their writing or evaluation system is solution to such need. So now-a- skill. days, most of the online competitive and universities exams are trying to evaluate the human written essays by 2. STATE-OF-ART examiners / teachers as well as by machines like automated Currently available systems to the automated assessment essay grading system. Such system has to significantly are Project Essay Grade (PEG), Intelligent Essay Assessor focus on vocabulary and text syntax, and text semantics. (IEA), Educational Testing service I (ETS I), Electronic The research paper focus on the existing automated essay Essay Rater (E-Rater), Bayesian Essay Test Scoring grading systems, their functional technologies and proposes sYstem (BETSY), Intelligent Essay Marking System a methodology to overcome the issues related to them (IEMS), Schema Extract Analyse and Report (SEAR), and while evaluating such as grammatical and semantic error as The Essay Scoring Tool (TEST) [1]. The working well as influence of local and regional languages in Hindi techniques of few AEGs are discussed here. essays. PEG is one of the initial implementations of automated CCS Concepts essay grading. PEG is a statistical approach based on the •Computing methodologies →Artificial assumption that the quality of essays is reflected by the intelligence → Natural language processing → Lexical measurable proxes [2]. It uses factors such as “proxes” i.e. semantics computer approximations or measures of trins which includes length of essay in terms of words to represent the trin of fluency; counts of prepositions, relative pronouns Keywords and other parts of speech. It also act as an indicator for Automated Essay Grading, AEG, NLP, Text Processing, complexity of sentence structure and variation in word Essay Evaluation, Semantic Attributes length to indicate diction using previously manually marked essays as a training sets in order to calculate the 1. INTRODUCTION regression coefficients. The other factor is intrinsic Mostly the essay grading systems available on the market variables to simulate human rater grading. Natural are used for grading essays written in pure English or pure Language Processing (NLP) technique and lexical content European languages. In India, we have almost 21 are not considered in PEG at all. recognized languages and 27 local languages and influence of these languages can be easily seen in Hindi essays. For IEA is a domain-independent tool based on the Latent examples, in the text “दिवाली के दिन लक्ष्मी के आगमन के Semantic Analysis (LSA) technique that was originally designed for indexing documents and text retrieval [3]. ववश्वास के साथ लोग अपने घरो के आँगन मे रं गोली या LSA represents documents and their word content in a आलपोना से सजाते है” and “राखी के पुण्य पवव पर घर मे ककसी large two-dimensional matrix semantic space. Using a के ननधन से यह त्योहार खोटा हो जाता है ”, the influence of matrix algebra technique known as Singular Value Decomposition (SVD), new relationships between words regional or local language can be clearly identified. and documents are uncovered, and existing relationship are Presently Computer-based Assessment Systems (CbAS) are modified to more accurately represent their true significance [4][5]. IEA includes relatively low unit cost, Copyright © 2017 for the individual papers by the papers’ authors. quick customized feedback, and plagiarism detection as its Copying permitted for private and academic purposes. This key features. The system is very well suited to analyze and volume is published and copyrighted by its editors. score expository essays on science, social studies, history, medicine or business topics and automatically assesses and 4th Column represents feature structure which criticizes electronically submitted text essay [1]. holds morphological information, grammatical roles, semantic information etc.. E-Rater is a statistical and corpus based approach uses Microsoft Natural Language Processing tool for parsing the essay and to extract linguistic features from the essays and are finally evaluated against a benchmark set of human graded essays [6][7]. E-Rater includes domain based 4. LEXICAL ANALYSIS OF HINDI analysis of the discourse structure, of the syntactic structure SENTENCES and of the vocabulary usage. It is composed by five main A precise research in this decade has helped us to independent modules. Three of these modules identify understand the AI & Machine Learning techniques based features for scoring guide criteria for the syntactic variety, existing AEG systems, going through some limitations to the organization of ideas and the vocabulary usage of an propose a methodology which could work under Indian essay. The rest modules are used to select and weigh context. The methodology is based on the series of predictive features for essay scoring and to compute the semantic evaluations. final score. A feedback component provide additional feedback about qualities of writing related to topic and For checking grammatical or semantic error, HDT of each fluency only sentence is captured. In this methodology, the features obtained from treebank are used to develop machine IEMS can be used both as an assessment tools and for learning techniques to identify the errors. The machine diagnostic and tutoring purposes in many content-based learning procedure analyzes each noun, pronoun and verb subjects [8]. It is based on Pattern Indexing Neural and postposition associated with it. It also analyzes the Network (the Indextron). Indextron is defined as a specific number and gender agreement between noun/pronoun and clusterisation algorithm and can be implemented as a neural verb. network embedded with an intelligent tutoring system for fast grading which provide feedback to students. Hindi is a free-word-order language but its default word order of sentences is Subject-Object-Verb (SOV). The TEST is a domain based first AES tool for Hindi need prior object may be direct or indirect or both. In Hindi, knowledge before checking an essay. It uses quality of postposition or vibhaktis/case marker is used instead of content, local coherence, factual accuracy, and global preposition in English language and is combined with noun coherence as scoring parameters [9]. Each sentence in an or pronoun or more generally a noun phrase. Vibhaktis like essay is connected to previous sentences. The degree of this ने,को ,से,का के,को,में, etc. are attached as suffix with noun or connection measures the coherence of the sentence pairs. Local coherence measures the inter sentence similarity pronoun. Sentences in Hindi may follow default word order whereas global coherence classify the structure of essays as conventions for coding the information of grammatical good, average or bad. It takes human graded essays as relations. Hindi language has rich morphological case in training sets and rates them as good essays and bad essays. which the subject and object and other verb arguments are The fact evaluation module contain topic specific identified in terms of the case markers that they bear keywords, list of essays, correct facts list, and incorrect (e.g. nominative, accusative, dative, genitive, ergative, facts list and produce individual essay reports & scores etc.). The subject in a sentence must agree with the finite with N X 1 Score Matrix for Internal use by TEST. It does verb in person, number, and gender to be grammatical not include grammatical checking and spell-check. correct. A sentence is considered to be ungrammatical if it contains syntactic error. Let us consider the following sentences, 3. HINDI DEPENDENCY TREEBANK Eg1. राम ने रावण मारा. Hindi Dependency Treebank (henceforth HDT) uses karaka Eg2. लड़की स्कूल में जाती हैं./ लड़की स्कूल को जाती हैं. - a syntactico-semantic relation as an intermediary step to express the semantic relations through vibhaktis [10]. Each Eg3. गोपाल अपने भाई से लंबी हैं । karaka has a default vibhakti. In linguistics, grammatical Although Eg1. is ungrammatically as it is missing “को” relations (also called grammatical functions or grammatical after रावण in the sentence, HDT considers it to be roles, or syntactic functions) refer to functional relationships between constituents in a clause [11]. The role grammatically correct as shown by the dependency of grammatical relations in theories of grammar is greatest structures of the sentences “राम ने रावण मारा”, “राम ने in dependency grammars, which tend to posit dozens of रावण को मारा” and “राम रावण को मारा” in the figure 1, 2 distinct grammatical relations. Every head-dependent dependency bears a grammatical function. Semantic and 3 respectively where k1 indicates Karta karaka analysis can be done using HDT as it includes Part-of- /Nominative Case (having ‘ने’ case marker) and k2 speech, Chunk Information, and Dependency Information. represent Karma karaka /Accusative case (having ‘को’ case For each sentence, the output of HDT has four columns marker). which are mentioned below, 1st Column represents Token or chunk id such as 1, 1.1, 2, 2.2 etc. 2nd Column indicates the actual word or word groups in the sentence having the attribute 'name' for naming. 3rd Column specifies part of speech This indicate that absence of case marker is not treated as grammatical error by HDT. Sentences in eg2. are Related cluster snapshot: आम [mango],फल grammatical but it could be more proper by eliminating the “में” and “को” as “लड़की स्कूल जाती हैं”. In eg3: गोपाल अपने [fruit],भारत [India], खेल [game], मोटर [automobile] भाई से लंबी हैं, the verb “लंबी” does not agree with the Hence, the verbs and adverbs can be matched against the subject “गोपाल” as it possesses masculine gender whereas attributes related to various senses and shall manage the the verb here has feminine gender. Now consider the correlation between the segments of the sentences or clauses. sentence “चला जाऐंगा अपने आप सुनील” which is ungrammatical too. Hence, the gender and number 6. CONCLUSION agreement helps in lexical analysis. The proposed methodology improves automated assessment by incorporating vast semantic attributes and grammar checking to overcome the issues related to 5. SEMANTIC ANALYSIS automated essay evaluation systems. The system has to be Semantic knowledge provided information such as evaluated on the basis of dependency and the supporting animacy, named entity categories and verb selectional information from WordNet about sense and correctness of restrictions. Named entity tag information is used to match the sentences. In future, the size and variety of the corpus the category of pronoun and their referent. The semantic has to be increased. The factors of grammar checking other class information (noun category) is used for the finding than number and gender agreements are considered as facts and fact evaluation in essays. The pairs which do not future research directions. have semantic feature match are filtered out. Using the semantic knowledge for each word, semantic analysis is 7. ACKNOWLEDGMENTS performed. "Semantic Analysis" refers to a formal analysis This work was supported by Research and Development of meaning, and "computational" refer to approaches that in Laboratory, Department of Computer Science and principle support effective implementation [12]. Semantic Engineering at Bhilai Institute of Technology, Durg, analysis involves the identification of the intended meaning Chhattisgarh, India, awaiting sponsorship from suitable at the word level i.e. word-sense disambiguation, as funding agencies. word has multiple meanings in different contexts. Semantic analysis also helps to understand that how different 8. REFERENCES sentence and textual elements fit together. The analysis [1] Salvatore Valenti, Francesca Neri and Alessandro began with the identification of word senses Cucchiarelli. 2003. An Overview of Current Research computationally, exploring the interrelationships between on Automated Essay Grading. DIIGA - Universita’ the elements of a sentence, and relations between sentences Politecnica delle Marche, Ancona, Italy (e.g., coreference), and examine the semantic Journal of Information Technology Education Volume relations and sentiment analysis. The dependency structures 2 shown in figure 1,2, and 3 indicates that HDT shows the meaning of these sentences to be correct although they are [2] Hearst, M. 2000. The Debate On Automated Essay grammatically incorrect. The dependency structure shows Grading. IEEE Intelligent Systems, 15(5), 22-37 the relation of noun phrases and verb phrases which are semantically interrelated. Semantic knowledge analyzes [3] Deerwester, S. C., Dumais, S. T., Landauer, T. K., multiple words and identifies their relations between as Furnas, G. W., & Harshman R. A. 1990. Indexing By hypernymy & hyponymy and meronymy & holonymy too. Latent Semantic Analysis. Journal of the American Hindi WordNet is a system for bringing together different Society for Information Science, 41(6), 391-407. lexical and semantic relations between the Hindi words [13]. For each word (lexical item) there is a synonym set, [4] Whittington, D. & Hunt, H. 1999. Approaches To The or synset, in the Hindi WordNet, representing one lexical Computerized Assessment Of Free Text Responses. In concept. Further, each synset is mapped to a concept M. Danson (Ed.), Proceedings of the Sixth ontology which defines the semantic properties of lexical International Computer Assisted Assessment items of a given synset. Conference, Loughborough University, UK. Example [5] Williams, R. 2001. Automated Essay Grading: An Word: फल Evaluation Of Four Conceptual Models. In A. Possible Senses Hermann & M.M. Kulski (eds). Expanding Horizons Sense 1: Result in Teaching and Learning. Proceedings of the 10th Annual Teaching and Learning Forum, Perth: Curtin Related cluster snapshot: सफलता [success],द्धीप University of Technology. [island],फल [result], पररणाम [result],असफलता [6] Burstein, J., Kukich, K., Wolff, S., Chi, L., & [failure],प्रततफल [failure] Chodorow M. 1998. Enriching Automated Essay Sense 2: Fruit Scoring Using Discourse Marking. Proceedings of the Workshop on Discourse Relations and Discourse Marking, Annual Meeting of the Associationof Computational Linguistics, Montreal, Canada. [7] Burstein, J., Leacock, C., & Swartz, R. 2001. Automated Evaluation Of Essay And Short Answers. In M. Danson (Ed.), Proceedingsof the Sixth International Computer Assisted Assessment Conference, Loughborough University, Loughborough, UK. [8] Ming, P.Y., Mikhailov, A.A., & Kuan, T.L. 2000. Intelligent Essay Marking System. In C. Cheers (Ed.), Learners Together,Feb. 2000, NgeeANN Polytechnic, Singapore. http://www.slideshare.net/singhg77/the-essay- scoring-tool-test-for-hindi [9] Bharati, A., Sangal, R., Sharma, D.M., and Bai, L. 2006. Anncorra: Annotating Corpora Guidelines For Pos And Chunk Annotation For Indian Languages. In Technical Report (TRLTRC-31), LTRC, IIIT- Hyderabad. https://en.wikipedia.org/wiki/Grammatical relation [10] Blackburn, P., and Bos, J. 2005. Representation and Inference For Natural Language: A First Course In Computational Semantics, CSLI Publications. ISBN 1-57586-496-7. http://www.cfilt.iitb.ac.in/wordnet/webhwn/