=Paper=
{{Paper
|id=Vol-1587/T4-1
|storemode=property
|title=ESM-IL: Entity Extraction from Social Media Text for Indian Languages @ FIRE 2015 - An Overview
|pdfUrl=https://ceur-ws.org/Vol-1587/T4-1.pdf
|volume=Vol-1587
|authors=Pattabhi RK Rao,Malarkodi CS,Vijay Sundar Ram R,Sobha Lalitha Devi
|dblpUrl=https://dblp.org/rec/conf/fire/RaoMRD15
}}
==ESM-IL: Entity Extraction from Social Media Text for Indian Languages @ FIRE 2015 - An Overview==
<pdf width="1500px">https://ceur-ws.org/Vol-1587/T4-1.pdf</pdf>
<pre>
ESM-IL: Entity Extraction from Social Media Text for Indian
         Languages @ FIRE 2015 – An Overview
     Pattabhi RK Rao                     Malarkodi CS                    Vijay Sundar Ram R                   Sobha Lalitha Devi
 AU-KBC Research Centre            AU-KBC Research Centre                AU-KBC Research Centre             AU-KBC Research Centre
   MIT Campus of Anna                MIT Campus of Anna                    MIT Campus of Anna                 MIT Campus of Anna
  University, Chrompet,             University, Chrompet,                 University, Chrompet,              University, Chrompet,
      Chennai, India                    Chennai, India                        Chennai, India                     Chennai, India
    +91 44 22232711                   +91 44 22232711                       +91 44 22232711                    +91 44 22232711
  pattabhi@au-kbc.org                 csmalarkodi@au-                    sundar@au-kbc.org                    sobha@au-kbc.org
                                          kbc.org


ABSTRACT                                                                  public view on their products and processes. This requires natural
Entity recognition is a very important sub task of Information            language processing software systems which recognizes the
extraction and find its applications in information retrieval,            entities or the associations of them or relation between them.
machine translation and other higher Natural Language                     Hence an automatic Entity extraction system is required.
Processing (NLP) applications such as co-reference resolution.            The objectives of this evaluation are:
Entities are real world elements or objects such as Person names,
Organization names, Product names, Location names. Entities are                    Creation of benchmark data for Entity Extraction in
often referred to as Named Entities. Entity extraction refers to                    Indian language Social Media text.
automatic identification of named entities in a text document.
                                                                                   To develop Named Entity Recognition (NER) systems
Given a text document, entities such as Person names,
                                                                                    in Indian language Social Media text.
Organization names, Location names, Product names are
identified and tagged. We observe that in the Indian language                      To identify the best suiting         machine learning
scenario there is no social media text corpus which could be used                   techniques.
to develop automatic systems. Entity recognition and extraction
has gained increased attention in Indian research community.              Entity extraction has been actively researched for over 20 years.
However there is no benchmark data available where all these              Most of the research has, however, been focused on resource rich
systems could be compared on same data for respective                     languages, such as English, French and Spanish. The scope of this
languages. Towards this we have organized the Entity extraction           work covers the task of named entity recognition in social media
in social media text track for Indian languages (ESM-IL) in the           text (twitter data) for Indian languages. In the past there were
Forum for Information Retrieval Evaluation (FIRE). We present             events such as Workshop on NER for South and South East Asian
the overview of ESM-IL 2015 track. This paper describes the               Languages (NER-SSEA, 2008), Workshop on South and South
corpus created for Hindi, Malayalam, Tamil and English. Here we           East Asian Natural Language Processing (SANLP, 2010&2011)
also present overview of the approaches used by the participants.         conducted to bring various research works on NER being done on
                                                                          a single platform. NERIL tracks at FIRE (Forum for Information
CCS Concepts                                                              Retrieval and Evaluation) in 2013 and 2014 have contributed to
• Computing     methodologies ~ Artificial intelligence                   the development of benchmark data and boosted the research
• Computing methodologies ~ Natural language processing                   towards NER for Indian languages. All these efforts were using
• Information systems ~ Information extraction                            texts from newswire data. The user generated texts such as twitter
                                                                          and facebook texts are diverse and noisy. These texts contain non-
Keywords                                                                  standard spellings and abbreviations, unreliable punctuation
Entity Extraction; Social Media Text; Twitter; Indian Languages;          styles. Apart from these writing style and language challenges,
Tamil; Hindi; Malayalam; English; Named Entity Annotated                  another challenge is concept drift (Dredze etal., 2010; Fromreide
Corpora for Twitter.                                                      et al., 2014); the distribution of language and topics on Twitter
                                                                          and Facebook is constantly shifting, thus leading to performance
1. INTRODUCTION                                                           degradation of NLP tools over time. Thus there is a need to
Over the past decade, Indian language content on various media            develop systems that focus on social media texts.
types such as websites, blogs, email, chats has increased                 The research in analyzing the social media data is taken up in
significantly. And it is observed that with the advent of smart           English through various shared tasks. Language identification in
phones more people are using social media such as twitter,                tweets (tweetLID) shared task held at SEPLN 2014 had the task of
facebook to comment on people, products, services, organizations,         identifying the tweets from six different languages. SemEval
governments. Thus we see content growth is driven by people               2013, 2014 and 2015 held as shared task track where sentiment
from non-metros and small cities who are mostly comfortable in            analysis in tweets were focused. They conducted two sub-tasks
their own mother tongue rather than English. The growth of                namely, contextual polarity disambiguation and message polarity
Indian language content is expected to increase by more than 70%          classification. In Indian languages, Amitav et al (2015) had
every year. Hence there is a great need to process this huge data         organized a shared task titled 'Sentiment Analysis in Indian
automatically. Especially companies are interested to ascertain

                                                                    74
languages' as a part of MIKE 2015, where sentiment analysis in             En: stamp released these_people get_beaten ….
tweets is done for tweets in Hindi, Bengali and Tamil language.            Ta: othavaangi …. kadasiya <loc>kovai</loc>
Named Entity recognition was explored in twitter through shared            En: get_slapped … at_end          kovai
task organized by Microsoft as part of 2015 ACL-IJCNLP, a
shared task on noisy user-generated text, where they had two sub-          Ta: pooyi pallakaatti kuththu vaangiyaachchu.
tasks namely, twitter text normalization and named entity                  En: gone show_tooth punch got
recognition for English. In the NER sub-task they have used ten
tags for annotating the text. The paper is organized as follows:
section 2 describes the challenges in named entity recognition on          (“They released stamp, got slapping and beating … at the end
Indian languages. Section 3 describes the corpus annotation, the           reached Kovai and got punched on the face”)
tag set and corpus statistics. And section 4 describes the overview        This example is a Tamil tweet where it is written in a particular
of the approaches used by the participants and section 5 concludes         dialect and also has usage of English words.
the paper.
                                                                           Example 2 (Malayalam):
2. CHALLENGES IN INDIAN LANGUAGE                                           ML: ediye … ente utuppu teechcho? illa
ENTITY EXTRACTION                                                          En: hey …     my dress    ironed? no
The challenges in the development of entity extraction systems for
Indian languages from social media text arise due to several               ML:chetta … raavile_tanne         engottaa?
factors. One of the main factors being there is no annotated data          En: brother … morning_itself where?
available for any of the Indian languages, though the earlier              ML: tekkati ….      teechchaale           parayullo?
initiatives have been concentrated on newswire text. Apart from
the lack of annotated data, the other factors which differentiate          En: hey_iron_it … only_after_ironing tell?
Indian languages from other European languages are the                     (Hey did you iron my dress? No… brother morning itself where
following:                                                                 are you going? Hey iron it … only after ironing you will tell?)
     a)   Morphologically rich – Indian languages are
          morphologically rich and agglutinative, hence the root
                                                                           This is a Malayalam tweet written in spoken form, where the
          word identification is difficult and requires
                                                                           phrase “teekku ati” has been written as “tekkati”, spoken form.
          morphological analyzers.
                                                                           This makes it resemble a place name and creates ambiguity. This
     b)   Ambiguity – Ambiguity between common and proper
                                                                           makes understanding difficult.
          nouns. Eg: common words such as “Roja” meaning
          Rose flower is a name of a person.                               Similarly in Hindi we find lot of spell variations. Such as for the
     c)   Spell variations – One of the major challenges is that           words “mumbai”, “gaandhi”, “sambandh”, “thanda” there are
          different people spell the same entity differently. For          atleast three different spelling variations.
          example: In Tamil person name -Roja is spelt as "rosa",
          "roja”.                                                          3. CORPUS DESCRIPTION
     d)   Less Resources – Most of the Indian languages are less           The corpus was collected using the twitter API in two different
          resource languages. There are no automated tools                 time periods. The training partition of the corpus was collected
          available to perform preprocessing tasks required for            during May – June 2015. And the test partition of the corpus was
          NER such as part-of-speech tagging, chunking which               collected during Aug – Sep 2015. As explained in the above
          can handle social media text.                                    sections, in the twitter data we observe concept drift. Thus to
                                                                           evaluate how the systems handle concept drift we had collected
Apart from these challenges we also find that development of
                                                                           data in two different time periods. In this present initiative the
automatic entity recognition systems is difficult due to following
                                                                           corpus is available for three Indian languages Hindi, Malayalam
reasons:
                                                                           and Tamil. And we have also provided the corpus for English, so
   i) Tweets contain a huge range of distinct named entity types.          that it would help researchers to compare their efforts with respect
Almost all these types (except for People and Locations) are               to English vis-à-vis the respective Indian languages. The
relatively infrequent, so even a large sample of manually                  following figures show different aspects of corpus statistics.
annotated tweets will contain very few training examples.
                                                                           3.1 ANNOTATION TAGSET
  ii) Twitter has a 140 character limit, thus tweets often lack            The corpus for each language was annotated manually by trained
sufficient context to determine an entity’s type without the aid of        experts. Named Entity Recognition task requires entities
background or world knowledge.                                             mentioned in the document to be detected, their sense to be
   iii) In comparison with English, Indian Languages have more             disambiguated, select the attributes to be assigned to the entity
dialectal variations. These dialects are mainly influenced by              and represent it with a tag. Defining the tag set is a very important
different regions and communities.                                         aspect in this work. The tag set chosen should be such that it
   iv) Indian Language tweets are multilingual in nature and               covers major classes or categories of entities. The tag set defined
predominantly contain English words.                                       should be such that it could be used at both coarse and fine
                                                                           grained level depending on the application. Hence a hierarchical
The following examples illustrate the usage of English words and           tag set will be the suitable one. Though we find that in most of the
spoken, dialectal forms in the tweets.                                     works Automatic Content Extraction (ACE) NE tag set has been
Example 1 (Tamil):                                                         used, in our work we have used a different tag set. The ACE Tag
                                                                           set is fine grained is towards defense/security domain. Here we
Ta: Stamp veliyittu ivaga        ativaangi …..


                                                                      75
have used Government of India standardized tag set which is more
generic.
The tag set is a hierarchical tag set. This Hierarchical tag set was
developed at AU-KBC Research Centre, and standardized by the
Ministry of Communications and Information Technology, Govt.
of India. This tag set is being used widely in Cross Lingual
Information Access (CLIA) and Indian Language – Indian
Language Machine Translation (IL-IL MT) consortium projects.


                                                                              Figure 3. Entity Distribution – for Hindi


Figure 1. Corpus Statistics – No.of Tweets and Entities in each
language


                                                                            Figure 4. Entity Distribution – for Malayalam


           Figure 2. Entity distribution - for English


                                                                              Figure 5. Entity Distribution – for Tamil


                                                                       76
In this tag set, named entity hierarchy is divided into three major         system was developed so that it would help in making a better
classes; Entity Name, Time and Numerical expressions. The                   comparative study. In the following paragraphs we would be
Name hierarchy has eleven attributes. Numeral Expression and                briefly explaining the approaches used by each team. All the
time have four and three attributes respectively. Person,                   teams results along with the bas system results are given in Table
organization, Location, Facilities, Cuisines, Locomotives,                  2.
Artifact, Entertainment, Organisms, Plants and Diseases are the             Pallavi team, had used CRFs, a machine learning technique to
eleven types of Named entities.                                             develop their system. They had used features such as POS,
Numerical expressions are categorized as Distance, Money,                   Chunk, Statistical Suffixes and prefixes (unigram, bigram and
Quantity and Count. Time, Year, Month, Date, Day, Period and                trigrams). They had first cleaned the provided training data to
Special day are considered as Time expressions. The tag set                 remove URLs and emoticons from tweets and pre-processed the
consists of three level hierarchies. The top level (or 1 st level)          text for POS and chunks. For the preprocessing purpose they have
hierarchy has 22 tags, the second level has 49 tags and third level         used open source NLP tools, “patter.en” for English and for Hindi
has 31 tags. Hence a total of 102 tags are available in this schema.        nltr. This team had participated in three languages Hindi, Tamil
But the data provided to the participants consisted of only the 1 st        and English. They had submitted 3 runs for Hindi and 2 runs each
level in the hierarchy that is consisting of only 22 tags. The other        for English and Tamil.
levels of tagging were hidden. This was done to make it little
easier for the participants to develop their systems using machine          Sarkar team, had used HMM for the development. Here they have
learning methods.                                                           preprocessed the data for POS and used POS tag as one of the
                                                                            states for HMM training. They had also used gazetteer lists. These
3.2 DATA FORMAT                                                             lists were collected using semi-manual efforts. And this team had
The participants were provided the data with annotation markup              only submitted results for English only.
in a separate file called annotation file. The raw tweets were to be
separately downloaded using the twitter API. The annotation file            Shriya team had used machine learning method SVM. They have
is a column format file, where each column was tab space                    used open source preprocessing tools for POS tagging and
separated. It consisted of the following columns:                           Chunking. For Tamil and Malayalam they had developed in house
                                                                            POS tagger and chunker by manually annotating small data of the
    i) Tweet_ID                                                             training corpus. The have also used an external resource brown
    ii) User_Id                                                             clusters as one of the features in training SVM. Other main
    iii) NE_TAG                                                             features used in training are 3-word window, POS tags, heuristic
    iv) NE raw string                                                       features such as capitalization, statistical suffixes and prefixes up
                                                                            to 3 characters. This is one of the teams that has participated in all
    v) NE Start_Index                                                       four languages.
    vi) NE_Length
For example:                                                                Sanjay team has used CRFs for their system development. This is
                                                                            another team which has participated in all four languages. This
                                                                            team also preprocessed the data for POS and chunking. For
          Tweet_ID:123456789012345678                                       English and Hindi they have open source tools for this purpose
          User_Id:1234567890                                                and whereas for Tamil and Malayalam in house they have
          NE_TAG:ORGANIZATION                                               developed these pre-processing tools.
          NE Raw String:SonyTV
                                                                            Chintak team had used CRFs and had pre-processed data for POS
          Index:43
                                                                            tagging and chunking. For this purpose they have used open
          Length:6                                                          source tools Genia tagger, which is tuned towards biological
                                                                            domain. And we feel this could have resulted in very lower recall
Index column is the starting character position of the NE                   values. They had also used features such as POS, Chunk, heuristic
                                                                            features. They had also used gazetteers as one of the features in
calculated for each tweet and the count starts from ‘0’. The
                                                                            their machine learning.
participants were also instructed to provide the test file
annotations in the same format as given for the training data. The          The team led by Vira had also used CRFs. They had used
figures below show various aspects of corpus statistics.                    Stanford preprocessing tools. They have used window of 5 words
                                                                            in the features for training along with POS tag, statistical suffixes
4. SUBMISSION OVERVIEWS                                                     and prefixes.
In this evaluation exercise we have used Precision, Recall and F-
measure, which are widely used for this task. A total of 10 teams           The team lead by Sombuddha had four different ML methods and
had registered for participation in this track. Later 7 teams were          submitted four different runs for English. They had also submitted
able to submit their systems for evaluation. A total of 17 test runs        runs for Hindi, but since the test submission did not conform to
were submitted for evaluation. All the teams had participated for           the format specified as per the task guidelines, it was disqualified.
English and Hindi languages, except for one team which had only             The features used are POS tag, window of words, heuristic
participated in English language. And three teams had participated          features such as Capitalisation, presence of numeric, hash tags.
in Tamil, and two teams had participated in Malayalam. We had               They had also used dictionary as binary feature.
developed a base system without any pre-processing of the data
and use of any lexical resources. We had developed this base                The different methodologies used by different teams have been
system by just using the raw data as such without any other                 summarizeed in Table 1.
features. We used CRFs for developing the base system. This base

                                                                       77
We observe that some of the participant systems have not                   [3]      Hege Fromreide, Dirk Hovy, and Anders Søgaard.2014.
performed well in comparison with the base system though                   “Crowdsourcing and annotating ner for twitter#drift”. European
several features were used for training. And most of the systems           language resources distributionagency.
have almost the same precision scores as obtained in the base
system. There is significant improvement in the recall of the              [4]      H.T. Ng, C.Y., Lim, S.K., Foo. 1999. “A Case Study on
systems with respect to base system. A deeper analysis of the              Inter-Annotator Agreement for Word Sense Disambiguation”. In
results obtained by the participant systems has to be done.                Proceedings of the {ACL} {SIGLEX} Workshop on Standardizing
                                                                           Lexical Resources {(SIGLEX99)}. Maryland. pp. 9-13.

5. CONCLUSION                                                              [5]     Preslav Nakov and Torsten Zesch and Daniel Cer
The main objective of creating benchmark data representing some            and David Jurgens. 2015. Proceedings of the 9th International
of the popular Indian languages has been achieved. And this data           Workshop on Semantic Evaluation (SemEval 2015).
has been made available to research community for free for                 [6]       Nakov, Preslav and Rosenthal, Sara and Kozareva,
research purposes. The data is user generated data and is not any          Zornitsa and Stoyanov, Veselin and Ritter, Alan and Wilson,
genre specific. Efforts are still going on to standardize this data        Theresa. 2013. SemEval-2013 Task 2: Sentiment Analysis in
and make it perfect data set for future researchers. We observe            Twitter. Second Joint Conference on Lexical and Computational
that the response from the participants for Hindi language has             Semantics (*SEM), Volume 2: Proceedings of the Seventh
been more than other languages. We hope to see more                        International Workshop on Semantic Evaluation (SemEval 2013)
publications in this area in the coming days from these different
research groups who could not submit their results. Also we                [7]      Rajeev Sangal and M. G. Abbas Malik. 2011.
expect more groups would start using this data for their research          Proceedings of the 1st Workshop on South and Southeast Asian
work.                                                                      Natural Language Processing (SANLP)
This ESM-IL track is one of the first elaborate efforts towards            [8]       Aravind K. Joshi and M. G. Abbas Malik. 2010.
creation of entity annotated user generated social media text for          Proceedings of the 1st Workshop on South and Southeast Asian
Indian languages. In this ESM-IL annotation tag set we have                Natural         Language         Processing       (SANLP).
made use of a hierarchical tag set. Thus this annotated data could         (http://www.aclweb.org/anthology/W10-36)
be used for any kind of applications. This tag set is very
exhaustive and has finer tags. The applications which require fine         [9]       Rajeev Sangal, Dipti Misra Sharma and Anil Kumar
grain tags could use the data with full annotation. And for                Singh. 2008. Proceedings of the IJCNLP-08 Workshop on Named
applications which do not require fine grain, the finer tags could         Entity Recognition for South and South East Asian Languages.
be suppressed in the data. The data being generic, this could be           (http://www.aclweb.org/anthology/I/I08/I08-03)
used for developing generic systems upon which a domain
                                                                           [10]      Pattabhi RK Rao, CS Malarkodi, Vijay Sundar R and
specific system could be built after customization.
                                                                           Sobha Lalitha Devi. 2014. Proceedings of Named-Entity
6. ACKNOWLEDGMENTS                                                         Recognition Indian Languages track at FIRE 2014. http://au-
We thank the FIRE 2015 organizers for giving us the opportunity            kbc.org/nlp/NER-FIRE2014/
to conduct the evaluation exercise.

7. REFERENCES
[1]      Arkaitz    Zubiaga, Iñaki     San    Vicente, Pablo
Gamallo, José    Ramom     Pichel   Campos, Iñaki    Alegría
Loinaz, Nora Aranberri, Aitzol Ezeiza, Víctor Fresno. 2014
TweetLID@SEPLN 2014, Girona, Spain, September 16th,
2014. CEUR Workshop Proceedings 1228, CEUR-WS.org 2014
[2]       Mark Dredze, Tim Oates, and Christine Piatko. 2010.
“We’re not in kansas anymore: detecting domainchanges in
streams”. In Proceedings of the 2010 Conferenceon Empirical
Methods in Natural LanguageProcessing, pages 585–595.
Association for ComputationalLinguistics.


                                                                      78
                                   Table 1. Participant Team Overview - Summary

Team                      Languages             Approaches   (ML      Features Used              Resources/Tools used
                          &System               method) Used
                          Submissions

Pallavi et al.,           English – 2 runs,     CRFs – CRF++ tool     POS, Chunk, Statistical    Cleaned data to remove
                          Hindi – 3 runs,       kit                   suffixes and prefixes      URLs, emoticons
Hindustan Institute of    Tamil – 2 runs
Technology and Science,                                                                          For English preprocessing
Chennai                                                                                          open source tool
                                                                                                 ‘pattern.en’
                                                                                                 For Hindi used open source
                                                                                                 tool nltr.org

K Sarkar                  English – 1 run       HMM                   POS tag                    POS Tagger – Monty
                                                                                                 Tagger – open source tool
Jadavpur University,
Kolkata                                                                                          Gazetteer List

Shriya et al.,            English – 1 run,      SVM - Machine         POS, Chunk, Statistical    Gazetteer list,
                          Hindi – 1 run,        Learning              Suffixes, Statistical
Amritha Vishwa            Malayalam – 1 run                           prefixes, Heuristics       For preprocessing used
Vidyapeetam, Coimbatore   Tamil – 3 runs                              such as capitalization     NLTK, Gimpel POS tagger
                                                                      information, Gazetteer     for English. NLTK for
                                                                      list, Shape feature        Hindi
                                                                                                 Developed in house tools
                                                                                                 for POS and Chunking for
                                                                                                 Tamil and Malayalam
                                                                                                 Brown Cluster for English

Sanjay et al.,            English – 2 Runs,     CRFs – CRF++ tool     POS, Chunk, Statistical    Gazetteer list,
                                                was used              Suffixes, Statistical
Amritha Vishwa            Hindi – 1 run,                              prefixes, Heuristics       For preprocessing used
Vidyapeetam, Coimbatore                                                                          NLTK, Gimpel POS tagger
                          Malayalam – 1 run                           such as capitalization
                                                                                                 for English. NLTK for
                                                                      information, Gazetteer
                          Tamil – 2 Runs                              list, Shape feature        Hindi
                                                                                                 Developed in house tools
                                                                                                 for POS and Chunking for
                                                                                                 Tamil and Malayalam
                                                                                                 Brown Cluster for English

Chintak et al.,           English – 2 runs      CRFs – CRFSuite       POS, Chunk, Gazetteer      Gazetteer list,
                                                tool was used         information, heuristics
LDRP Institute, Gujarat   Hindi – 2 runs                                                         POS Tagger – RDR open
                                                                                                 source tool,
                                                                                                 Chunker – Genia tagger

Vira et al,               English – 1 run,      CRFs – CRFSuite       Word structures,           English – Stanford NLp
                                                tool was used         statistical suffixes and   tool
Charotar University of    Hindi – 1 run                               prefixes, heuristic
Science and Technology,                                               features using             Hindi – RDR tool
Gujarat                                                               postpositions

Sombuddha et al.,         English – 4 runs      CRFs, Naïve Bayes,    POS, Window of             POS tagger open source –
                                                MIRA, Decision        Words, heuristics          ark-tweet-nlp tool
Jadavpur University                             tree- J-48            features


                                                       79
                                                           Table 2. Evaluation Results

Language               Hindi                                Tamil                               Malayalam                       English

Teams                  P           R           F            P           R           F           P           R           F       P         R       F

Base System            73.05       34.81       47.10        56.50       11.46       19.05       62.58       20.82       31.24   73.54     28.01   40.56

Shriya      - Run1     71.56       54.09       61.61        55.23       11.03       18.39       51.18       40.29       45.08   58.78     40.73   48.11
Amritha
              Run2           -           -           -      61.55       19.82       29.98             -           -         -       -         -        -

              Run3           -           -           -      60.82       19.42       29.44             -           -         -       -         -        -

Sanjay      - Run1         74.65       5.26        9.83         70.11       19.81       30.89       60.05       39.94   47.97 46.78       24.90   32.50
Amritha
              Run2           -           -           -          54.87       18.91       28.13         -           -         -   46.88     25.64   33.15

Chintak     - Run1         67.11       0.76        1.51           -           -           -           -           -         -   7.30       4.17       5.31
LDRP
              Run2         74.73       46.84       57.59          -           -           -           -           -         -   5.35       5.67       5.50

KSarkar     – Run1           -           -           -            -           -           -           -           -         -   61.96     39.46   48.21
JU

Vira -        Run1         25.65       16.14       19.82          -           -           -           -           -         -   4.13       3.39       3.72
Charotar
Univ

Pallavi     - Run1         81.21       44.57       57.55        70.42       14.13       23.54         -           -         -   50.48     32.03   39.19
HITS
              Run2         80.86       44.25       57.20        64.52       22.14       32.97         -           -         -   50.21     37.06   42.64

              Run3         81.49       41.58       55.06          -           -           -           -           -         -       -         -        -

Sombuddha Run1         -                 -           -            -           -           -           -           -         -   46.92     32.41   38.34
– JU **
          Run2         -                 -           -            -           -           -           -           -         -   58.09     31.85   41.15

              Run3     -           -           -            -           -           -           -           -           -       49.10     31.59   38.45

              Run4     -           -           -            -           -           -           -           -           -       58.09     31.85   41.15


** Though this team had submitted Hindi runs, these were disqualified due to data format not confirming with the task guidelines.


                                                                              80

</pre>