=Paper= {{Paper |id=Vol-1819/edudm2017-paper5 |storemode=property |title= |pdfUrl=https://ceur-ws.org/Vol-1819/edudm2017-paper5.pdf |volume=Vol-1819 |authors=Seema Mahato,Ani Thomas |dblpUrl=https://dblp.org/rec/conf/indiaSE/MahatoT17 }} ==== https://ceur-ws.org/Vol-1819/edudm2017-paper5.pdf
        LEXICO-SEMANTIC ANALYSIS OF ESSAYS IN HINDI
                        LANGUAGE
                         Seema Mahato                                              Dr. (Mrs.) Ani Thomas
                       Research Scholar,                                  Professor, Dept. of Computer Applications,
                   Dr. C.V. Raman University,                                   Bhilai Institute of Technology,
                   BILASPUR (C.G.), INDIA,                                           DURG (C.G.), INDIA,
             seema_mahato@yahoo.co.in                                            tpthomas22@yahoo.com

ABSTRACT                                                             rare whose primary focus is to provide automatic grading
Large number of researchers consider essay as a tool to              and evaluation of Hindi essays. Automated Essay Grading
judge learning outcomes and intellectual capabilities and to         (AEG) Systems, also known as Automated Writing
assess the organized and integrated thoughts. Due to                 Evaluation Systems or Automated Essay Assessors. The
increase in the number of universities students and distance         aim of such system is to score student’s essays on a specific
and ubiquitous e-learning approaches, the intention of using         topic and give feedback to the student on deficiencies in
Computer-based Assessment Systems has rise rapidly in a              his/her essay. These systems lower down the burden of the
decade. Manual grading of students' essays requires                  evaluator as unable to give personalized attention to the
significant amount of time and hard work and also an                 student’s needs. Such systems provide human capability of
expensive activity for educational institutions and need a           reading and writing and also time-to-time feedback to the
practical solution to this task. The automated essay grading         writers/students which help them to improve their writing
or evaluation system is solution to such need. So now-a-             skill.
days, most of the online competitive and universities exams
are trying to evaluate the human written essays by
                                                                     2. STATE-OF-ART
examiners / teachers as well as by machines like automated           Currently available systems to the automated assessment
essay grading system. Such system has to significantly               are Project Essay Grade (PEG), Intelligent Essay Assessor
focus on vocabulary and text syntax, and text semantics.             (IEA), Educational Testing service I (ETS I), Electronic
The research paper focus on the existing automated essay             Essay Rater (E-Rater), Bayesian Essay Test Scoring
grading systems, their functional technologies and proposes          sYstem (BETSY), Intelligent Essay Marking System
a methodology to overcome the issues related to them                 (IEMS), Schema Extract Analyse and Report (SEAR), and
while evaluating such as grammatical and semantic error as           The Essay Scoring Tool (TEST) [1]. The working
well as influence of local and regional languages in Hindi           techniques of few AEGs are discussed here.
essays.
                                                                     PEG is one of the initial implementations of automated
CCS Concepts                                                         essay grading. PEG is a statistical approach based on the
•Computing methodologies →Artificial                                 assumption that the quality of essays is reflected by the
intelligence → Natural language processing → Lexical                 measurable proxes [2]. It uses factors such as “proxes” i.e.
semantics                                                            computer approximations or measures of trins which
                                                                     includes length of essay in terms of words to represent the
                                                                     trin of fluency; counts of prepositions, relative pronouns
Keywords                                                             and other parts of speech. It also act as an indicator for
Automated Essay Grading, AEG, NLP, Text Processing,                  complexity of sentence structure and variation in word
Essay Evaluation, Semantic Attributes                                length to indicate diction using previously manually
                                                                     marked essays as a training sets in order to calculate the
1. INTRODUCTION                                                      regression coefficients. The other factor is intrinsic
Mostly the essay grading systems available on the market             variables to simulate human rater grading. Natural
are used for grading essays written in pure English or pure          Language Processing (NLP) technique and lexical content
European languages. In India, we have almost 21                      are not considered in PEG at all.
recognized languages and 27 local languages and influence
of these languages can be easily seen in Hindi essays. For           IEA is a domain-independent tool based on the Latent
examples, in the text “दिवाली के दिन लक्ष्मी के आगमन के              Semantic Analysis (LSA) technique that was originally
                                                                     designed for indexing documents and text retrieval [3].
ववश्वास के साथ लोग अपने घरो के आँगन मे रं गोली या                    LSA represents documents and their word content in a
आलपोना से सजाते है” and “राखी के पुण्य पवव पर घर मे ककसी             large two-dimensional matrix semantic space. Using a
के ननधन से यह त्योहार खोटा हो जाता है ”, the influence of
                                                                     matrix algebra technique known as Singular Value
                                                                     Decomposition (SVD), new relationships between words
regional or local language can be clearly identified.                and documents are uncovered, and existing relationship are
Presently Computer-based Assessment Systems (CbAS) are               modified to more accurately represent their true
                                                                     significance [4][5]. IEA includes relatively low unit cost,
Copyright © 2017 for the individual papers by the papers’ authors.   quick customized feedback, and plagiarism detection as its
Copying permitted for private and academic purposes. This            key features. The system is very well suited to analyze and
volume is published and copyrighted by its editors.
                                                                     score expository essays on science, social studies, history,
medicine or business topics and automatically assesses and                 4th Column represents feature structure which
criticizes electronically submitted text essay [1].                         holds morphological information, grammatical
                                                                            roles, semantic information etc..
E-Rater is a statistical and corpus based approach uses
Microsoft Natural Language Processing tool for parsing the
essay and to extract linguistic features from the essays and
are finally evaluated against a benchmark set of human
graded essays [6][7]. E-Rater includes domain based               4. LEXICAL ANALYSIS OF HINDI
analysis of the discourse structure, of the syntactic structure      SENTENCES
and of the vocabulary usage. It is composed by five main          A precise research in this decade has helped us to
independent modules. Three of these modules identify              understand the AI & Machine Learning techniques based
features for scoring guide criteria for the syntactic variety,    existing AEG systems, going through some limitations to
the organization of ideas and the vocabulary usage of an          propose a methodology which could work under Indian
essay. The rest modules are used to select and weigh              context. The methodology is based on the series of
predictive features for essay scoring and to compute the          semantic evaluations.
final score. A feedback component provide additional
feedback about qualities of writing related to topic and          For checking grammatical or semantic error, HDT of each
fluency only                                                      sentence is captured. In this methodology, the features
                                                                  obtained from treebank are used to develop machine
IEMS can be used both as an assessment tools and for              learning techniques to identify the errors. The machine
diagnostic and tutoring purposes in many content-based            learning procedure analyzes each noun, pronoun and verb
subjects [8]. It is based on Pattern Indexing Neural              and postposition associated with it. It also analyzes the
Network (the Indextron). Indextron is defined as a specific       number and gender agreement between noun/pronoun and
clusterisation algorithm and can be implemented as a neural       verb.
network embedded with an intelligent tutoring system for
fast grading which provide feedback to students.                  Hindi is a free-word-order language but its default word
                                                                  order of sentences is Subject-Object-Verb (SOV). The
TEST is a domain based first AES tool for Hindi need prior        object may be direct or indirect or both. In Hindi,
knowledge before checking an essay. It uses quality of            postposition or vibhaktis/case marker is used instead of
content, local coherence, factual accuracy, and global            preposition in English language and is combined with noun
coherence as scoring parameters [9]. Each sentence in an          or pronoun or more generally a noun phrase. Vibhaktis like
essay is connected to previous sentences. The degree of this      ने,को ,से,का के,को,में, etc. are attached as suffix with noun or
connection measures the coherence of the sentence pairs.
Local coherence measures the inter sentence similarity            pronoun. Sentences in Hindi may follow default word order
whereas global coherence classify the structure of essays as      conventions for coding the information of grammatical
good, average or bad. It takes human graded essays as             relations. Hindi language has rich morphological case in
training sets and rates them as good essays and bad essays.       which the subject and object and other verb arguments are
The fact evaluation module contain topic specific                 identified in terms of the case markers that they bear
keywords, list of essays, correct facts list, and incorrect       (e.g. nominative, accusative, dative, genitive, ergative,
facts list and produce individual essay reports & scores          etc.). The subject in a sentence must agree with the finite
with N X 1 Score Matrix for Internal use by TEST. It does         verb in person, number, and gender to be grammatical
not include grammatical checking and spell-check.                 correct. A sentence is considered to be ungrammatical if it
                                                                  contains syntactic error. Let us consider the following
                                                                  sentences,
3. HINDI DEPENDENCY TREEBANK                                      Eg1. राम ने रावण मारा.
Hindi Dependency Treebank (henceforth HDT) uses karaka            Eg2. लड़की स्कूल में जाती हैं./ लड़की स्कूल को जाती हैं.
- a syntactico-semantic relation as an intermediary step to
express the semantic relations through vibhaktis [10]. Each       Eg3. गोपाल अपने भाई से लंबी हैं ।
karaka has a default vibhakti. In linguistics, grammatical        Although Eg1. is ungrammatically as it is missing “को”
relations (also called grammatical functions or grammatical
                                                                  after रावण in the sentence, HDT considers it to be
roles, or syntactic functions) refer to functional
relationships between constituents in a clause [11]. The role     grammatically correct as shown by the dependency
of grammatical relations in theories of grammar is greatest       structures of the sentences “राम ने रावण मारा”, “राम ने
in dependency grammars, which tend to posit dozens of             रावण को मारा” and “राम रावण को मारा” in the figure 1, 2
distinct grammatical relations. Every head-dependent
dependency bears a grammatical function. Semantic                 and 3 respectively where k1 indicates Karta karaka
analysis can be done using HDT as it includes Part-of-            /Nominative Case (having ‘ने’ case marker) and k2
speech, Chunk Information, and Dependency Information.            represent Karma karaka /Accusative case (having ‘को’ case
For each sentence, the output of HDT has four columns
                                                                  marker).
which are mentioned below,
          1st Column represents Token or chunk id such as
           1, 1.1, 2, 2.2 etc.
          2nd Column indicates the actual word or word
           groups in the sentence having the attribute 'name'
           for naming.
          3rd Column specifies part of speech
This indicate that absence of case marker is not treated as
grammatical error by HDT. Sentences in eg2. are                          Related cluster snapshot: आम [mango],फल
grammatical but it could be more proper by eliminating the
“में” and “को” as “लड़की स्कूल जाती हैं”. In eg3: गोपाल अपने              [fruit],भारत [India], खेल [game], मोटर
                                                                          [automobile]
भाई से लंबी हैं, the verb “लंबी” does not agree with the         Hence, the verbs and adverbs can be matched against the
subject “गोपाल” as it possesses masculine gender whereas         attributes related to various senses and shall manage the
the verb here has feminine gender. Now consider the              correlation between the segments of the sentences or
                                                                 clauses.
sentence “चला जाऐंगा अपने आप सुनील” which is
ungrammatical too. Hence, the gender and number                 6. CONCLUSION
agreement helps in lexical analysis.                            The proposed methodology improves automated
                                                                assessment by incorporating vast semantic attributes and
                                                                grammar checking to overcome the issues related to
5. SEMANTIC ANALYSIS                                            automated essay evaluation systems. The system has to be
Semantic knowledge provided information such as                 evaluated on the basis of dependency and the supporting
animacy, named entity categories and verb selectional           information from WordNet about sense and correctness of
restrictions. Named entity tag information is used to match     the sentences. In future, the size and variety of the corpus
the category of pronoun and their referent. The semantic        has to be increased. The factors of grammar checking other
class information (noun category) is used for the finding       than number and gender agreements are considered as
facts and fact evaluation in essays. The pairs which do not     future research directions.
have semantic feature match are filtered out. Using the
semantic knowledge for each word, semantic analysis is          7. ACKNOWLEDGMENTS
performed. "Semantic Analysis" refers to a formal analysis      This work was supported by Research and Development
of meaning, and "computational" refer to approaches that in     Laboratory, Department of Computer Science and
principle support effective implementation [12]. Semantic       Engineering at Bhilai Institute of Technology, Durg,
analysis involves the identification of the intended meaning    Chhattisgarh, India, awaiting sponsorship from suitable
at the word level i.e. word-sense disambiguation, as            funding agencies.
word has multiple meanings in different contexts. Semantic
analysis also helps to understand that how different            8. REFERENCES
sentence and textual elements fit together. The analysis        [1] Salvatore Valenti, Francesca Neri and Alessandro
began with the identification of word senses                        Cucchiarelli. 2003. An Overview of Current Research
computationally, exploring the interrelationships between           on Automated Essay Grading. DIIGA - Universita’
the elements of a sentence, and relations between sentences         Politecnica delle Marche, Ancona, Italy
(e.g., coreference),    and     examine       the    semantic       Journal of Information Technology Education Volume
relations and sentiment analysis. The dependency structures         2
shown in figure 1,2, and 3 indicates that HDT shows the
meaning of these sentences to be correct although they are      [2] Hearst, M. 2000. The Debate On Automated Essay
grammatically incorrect. The dependency structure shows             Grading. IEEE Intelligent Systems, 15(5), 22-37
the relation of noun phrases and verb phrases which are
semantically interrelated. Semantic knowledge analyzes          [3] Deerwester, S. C., Dumais, S. T., Landauer, T. K.,
multiple words and identifies their relations between as            Furnas, G. W., & Harshman R. A. 1990. Indexing By
hypernymy & hyponymy and meronymy & holonymy too.                   Latent Semantic Analysis. Journal of the American
Hindi WordNet is a system for bringing together different           Society for Information Science, 41(6), 391-407.
lexical and semantic relations between the Hindi words
[13]. For each word (lexical item) there is a synonym set,      [4] Whittington, D. & Hunt, H. 1999. Approaches To The
or synset, in the Hindi WordNet, representing one lexical           Computerized Assessment Of Free Text Responses. In
concept. Further, each synset is mapped to a concept                M. Danson (Ed.), Proceedings of the Sixth
ontology which defines the semantic properties of lexical           International Computer Assisted Assessment
items of a given synset.                                            Conference, Loughborough University, UK.

Example                                                         [5] Williams, R. 2001. Automated Essay Grading: An
Word: फल                                                            Evaluation Of Four Conceptual Models. In A.
Possible Senses                                                     Hermann & M.M. Kulski (eds). Expanding Horizons
Sense 1: Result                                                     in Teaching and Learning. Proceedings of the 10th
                                                                    Annual Teaching and Learning Forum, Perth: Curtin
          Related cluster snapshot: सफलता [success],द्धीप           University of Technology.
          [island],फल [result], पररणाम [result],असफलता
                                                                [6] Burstein, J., Kukich, K., Wolff, S., Chi, L., &
          [failure],प्रततफल [failure]
                                                                    Chodorow M. 1998. Enriching Automated Essay
Sense 2: Fruit                                                      Scoring Using Discourse Marking. Proceedings of the
                                                                    Workshop on Discourse Relations and Discourse
    Marking, Annual Meeting of the Associationof
    Computational Linguistics, Montreal, Canada.

[7] Burstein, J., Leacock, C., & Swartz, R. 2001.
    Automated Evaluation Of Essay And Short Answers.
    In M. Danson (Ed.), Proceedingsof the Sixth
    International Computer Assisted Assessment
    Conference, Loughborough University,
    Loughborough, UK.

[8] Ming, P.Y., Mikhailov, A.A., & Kuan, T.L. 2000.
    Intelligent Essay Marking System. In C. Cheers (Ed.),
    Learners Together,Feb. 2000, NgeeANN Polytechnic,
    Singapore.

     http://www.slideshare.net/singhg77/the-essay-
    scoring-tool-test-for-hindi

[9] Bharati, A., Sangal, R., Sharma, D.M., and Bai, L.
    2006. Anncorra: Annotating Corpora Guidelines For
    Pos And Chunk Annotation For Indian Languages. In
    Technical Report (TRLTRC-31), LTRC, IIIT-
    Hyderabad.
    https://en.wikipedia.org/wiki/Grammatical relation

[10] Blackburn, P., and Bos, J. 2005. Representation and
     Inference For Natural Language: A First Course In
     Computational Semantics, CSLI Publications. ISBN
     1-57586-496-7.
     http://www.cfilt.iitb.ac.in/wordnet/webhwn/