<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Forum for Information Retrieval Evaluation</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>of the FIRE 2020 EDNIL Track: Event Detection from News in Indian Languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bhargav Dave</string-name>
          <email>bhargavdave1@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Surupendu Gangopadhyay</string-name>
          <email>surupendu.g@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prasenjit Majumder</string-name>
          <email>prasenjit.majumder@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pushpak Bhattacharya</string-name>
          <email>pushpakbh@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sudeshna Sarkar</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sobha Lalitha Devi</string-name>
          <email>sobha@au-kbc.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Multilingual Event Detection</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Event Identification</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Event Frame Extraction</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AU-KBC Research Centre,MIT Campus of Anna University</institution>
          ,
          <addr-line>Chennai</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dhirubhai Ambani Institute of Information and Communication Technology</institution>
          ,
          <addr-line>Gandhinagar</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Indian Institute of Technology Bombay</institution>
          ,
          <addr-line>Mumbai</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Indian Institute of Technology Kharagpur</institution>
          ,
          <addr-line>Kharagpur</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>event frame consists of Event type</institution>
          ,
          <addr-line>Casualties, Time, Place, Reason</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>1</volume>
      <fpage>6</fpage>
      <lpage>20</lpage>
      <abstract>
        <p>The goal of FIRE 2020 EDNIL track was to create a framework which could be used to detect events from news articles in English, Hindi, Bengali, Marathi and Tamil. The track consisted of two tasks: (i) Identifying a piece of text from news articles that contains an event (Event Identification). (ii) Creating an event frame from the news article (Event Frame Extraction). The events that were identified in Event Identification task were Man-made Disaster and Natural Disaster. In Event Frame Extraction task the An event is defined as an occurrence happening in a certain place during a particular interval of time with or without the participation of human agents. It may be part of a chain of occurrences or an outcome or efect of preceding occurrence or a cause of succeeding occurrences. An event can occur naturally or it can be because of human actions. An event can have a location, time, agents involved (causing agent and on which the efect of the event is felt) etc.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>1.1. Task 1: Event Identification</title>
        <p>In this task the participants had to identify a event given a news article. The events were of two
type: Natural disaster and Manmade disaster.</p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Task 2: Event Frame Extraction</title>
        <p>In this task the participants had to form an event frame given a news article.The event frame
consists of the following fields:
1. Type: Detect the type of the event. There are two type of events
a) Natural disaster
b) Manmade disaster
2. Subtype: It is the event which is subtype of Natural or Manmade disaster.</p>
        <p>The subtypes of Natural disaster are forest fire, hurricane, cold wave, tornado, storm,
hail storms, blizzard, avalanches, heat wave, cyclone, drought, heavy rainfall, limnic
erruptions, floods, tsunami, land slide, volcano, earthquake, rock fall, seismic risk, famine,
epidemic and pandemic.</p>
        <p>The subtypes of Manmade disaster are crime, riots, aviation hazard, accidents, train
collision, vehicular collision, transport hazards, industrial accident, fire, normal bombing,
terrorist attack, miscellaneous, shoot out, surgical strikes, suicide attack and armed
conflicts.
3. Casualties: Number of people injured or killed and Damage to properties.
4. Time: When the event took place
5. Place: Where the event took place
6. Reason: Why and how the event took place</p>
        <p>
          Shared tasks on event detection have also been proposed earlier, such as TAC-KBP 2016 Event
Nugget track [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] where the task was to detect an event and then link the words that refer to that
event from English, Spanish and Chinese articles, FIRE 2018 EventXtract-IL [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] where the task
was to detect an event and also extract arguments like location, cause, efect from Hindi and
Tamil news articles. CLEF 2019 Lab ProtestNews [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] where the task was to detect protest news
and form an event frame (Event, Participant, Target, Place, Time) from English news articles.
        </p>
        <p>The contribution of EDNIL is that it provides an annotated dataset for event detection from
ifve Indian languages i.e. English, Hindi, Bengali, Marathi and Tamil.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset</title>
      <p>The dataset was created as part of the project ”A Platform for Cross-lingual and Multilingual
Event Monitoring in Indian Languages” 1. The dataset consists of news articles in English,
Hindi, Bengali, Marathi and Tamil languages which have been collected from diferent news
agencies. The statistics of the dataset documents is shown in Table 1.</p>
      <p>1https://imprint-india.org/knowledge-portal-5592-a-platform-for-crosslingual-and-multilingual-eventmonitoring-in-indian-languages</p>
      <p>News article of each language is annotated manually by annotators from IIT Kharagpur
(Bengali), IIT Bombay (Marathi), IIT Patna (Hindi), AU-KBC (English and Tamil). The annotation
has been done at word level and the news articles after annotation are stored in XML format.
The description of the XML tags are given below and the statistics of the XML tags is shown in
Table 2.</p>
      <p>&lt;MAN_MADE_EVENT ID = ” number ” TYPE = ” s u b t y p e ” &gt;
E v e n t T r i g g e r
&lt;/MAN_MADE_EVENT&gt;
&lt;NATURAL_EVENT ID = ” number ” TYPE = ” s u b t y p e ” &gt;
E v e n t T r i g g e r
&lt;/NATURAL_EVENT&gt;
Here MAN_MADE_EVENT and NATURAL_EVENT tag is related to Manmade disaster and
Natural disaster event respectively, contains the event trigger and has the following attributes:
1. ID : A number which is unique for each event/tag in a given document.
2. TYPE : Represents subtype of the particular event (Manmade disaster or Natural disaster).</p>
      <p>The event Manmade disaster has subtypes crime, riots, aviation hazard, accidents, train
collision, vehicular collision, transport hazards, industrial accident, fire, normal bombing,
terrorist attack, miscellaneous, shoot out, surgical strikes, suicide attack and armed conflicts.
Language wise details statistics of subtypes of man made event XML tag shown in Table 3.
The event Natural Disaster has subtypes forest fire, hurricane, cold wave, tornado, storm, hail
storms, blizzard, avalanches, heat wave, cyclone, drought, heavy rainfall, limnic erruptions,
lfoods, tsunami, land slide, volcano, earthquake, rock fall, seismic risk, famine, epidemic and
pandemic. Language wise details statistics of subtypes of natural disaster event XML tag shown
in Table 4.</p>
      <p>The event arguments are casualties, reason, time of occurrence of event and location of event.
The XML tags wrt each event argument is given below:
1. &lt;CAUSUALITIES-ARG&gt; : This tag contains the words that are casualties that have
occurred due to an event.
2. &lt;TIME-ARG&gt; : This tag contains the words that are time at which the event has occurred.
3. &lt;PLACE-ARG&gt; : This tag contains the words that is the place at which the event has
occurred.
4. &lt;REASON-ARG&gt; : This tag contains the words that are the reason due to which the event
has occurred.</p>
      <p>For example, the “casualties” attribute of an event is annotated as follows:
&lt;CASUALTIES−ARG ID = ” number ” &gt;
c a s u a l t i e s
&lt;CASUALTIES−ARG&gt;
Each argument tag of an event has the attribute “ID,” which is an unique number for each tag in
a given news article.</p>
      <p>An example, of annotation of man-made event news ”The accident occurred around 6.30 pm
at Manathoor Church junction on the Pala-Thodupuzha State Highway.” is shown in Fig. 1 and
an example annotation of natural event news ”An earthquake measuring 5.5 on the Richter</p>
    </sec>
    <sec id="sec-3">
      <title>3. Evaluation</title>
      <p>In both task 1 and task 2 the evaluation metrics that was used was F1-score. The F1-score was
calculated separately for all the five languages in both Task 1 and Task 2. For Task 2 the F1
score was calculated separately for each argument in the event frame and then the score was
averaged out. While evaluating the arguments in the event frame only exact string match of the
values was considered. Eg: If the PLACE argument in test article is New Delhi and the output
of the PLACE argument for test article given by the participant’s method is Delhi then it was
not be considered as a match.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>For the first task of Event Identification in English language, we received seven runs from five
teams. For Hindi language we received five runs from three teams. For Bengali language we
received six runs from four teams. In Marathi and Tamil language, for each we received two
runs from two teams.</p>
      <p>For the second task of Event Frame Extraction in English language, we received three runs
from three teams. In case of Hindi, Bengali, Marathi and Tamil languages for each language we
received one run from one team. The submission statistics are shown in Table 5. The results for
all the five languages shown from Tables 6,7,8,9.</p>
      <p>
        Team 3Idiots [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] ranked first for both Task 1 and Task 2 across all languages. They used
n-gram and regex based features for representing the news articles. And then used these features
in a CRF model for doing Task 1 and Task 2. For each language the CRF model was trained
separately.
      </p>
      <p>Team BUDDI_SAP 2 ranked second in both task in English language. They used DistillBERT
based word embedding, POS tags based embeddings and character level embeddings which
were then concatenated together to represent a word. This was then passed through Bi-LSTM
the output of which passed through fully connected layer which was used to predict the words
associated with an argument. Two separate models were trained for Task 1 and Task 2.</p>
      <p>
        Run number 3,2 and 1 of team ComMA [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] were ranked second,third and fourth respectively
for Task 1 in Hindi and Bengali languages. And third, fourth and fith for Task 1 in English
language. In run number 3 XLM RoBERTa was used for text representation of all three languages
mentioned earlier, which was then fine tuned for Task 1, in run number 2 DistillBERT was used
2Anand Subramanian, Praveen Kumar Suresh, Sharafath Mohamed were not able to submit a paper due to prior
commitments but gave a presentation in FIRE 2020
NLP@ISI
3Idiots
BUDDI_SAP
      </p>
      <p>NLP@ISI
1
1
SR NO</p>
      <p>Team Name</p>
      <p>Run</p>
      <p>Precision
for text representation of all three languages, which was then fine tuned for Task 1.And in run
number 3 BERT was used for text representation of all three languages, which was then fine
tuned for Task 1.</p>
      <p>
        Team MUCS [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] ranked second in Task 1 in Marathi and Tamil languages, ranked fith in
Task 1 in Hindi and Bengali languages and ranked sixth in Task 1 in English language. They
used Linear SVC based on char n-grams, sufix and prefix features of tokens for all the five
language of Task 1.
      </p>
      <p>
        Team NLP@ISI [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] ranked sixth and seventh for Bengali and English language respectively
in Task 1 and ranked third in Task 2 in English language. They used bag-of-words approach to
1
identify the disaster event and used string based keyword matching to identify the arguments
like Casualty, Reason.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Concluding Discussions</title>
      <p>The FIRE 2020 EDNIL track was successful in releasing a multilingual dataset of Indian languages
for event detection. As can be observed from the result tables for Task 1 barring English there
is still lot of scope to improve the F1 scores for other languages. And for Task 2 there is still
a huge scope for improvement in all languages. In the future we plan to extend the task by
introducing event linking which will link one event to another if they are related to each other.
For evaluation we intend to evaluate partial matching strings along with full matching strings.
We also plan to introduce a summarization of event task wherein a summary of events within a
particular time period will be generated and a short description of the events will be generated.
However for this task annotators will be required who can create a gold standard dataset of
event based summaries, which may require significant amount of time.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The track organizers thank all the participants for their interest in this track. We also thank
the FIRE 2020 organizers for their support in organizing the track. We thank the Principal
Investigator, Co-Principal Investigators and Host Institute (IIT Kharagpur) of ”A Platform for
Crosslingual and Multilingual Event Monitoring in Indian Languages” for providing us with
this opportunity of using the dataset in the track. We also thank Ministry of Electronics and
Information Technology (MeitY) and Ministry of Human Resource Development, Government
of India for providing this opportunity to develop the dataset and other resources.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Wip event detection system at tac kbp 2016 event nugget track</article-title>
          ,
          <source>TAC</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P. R. K.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Devi</surname>
          </string-name>
          , Eventxtract-il:
          <article-title>Event extraction from newswires and social media text in indian languages @ FIRE 2018 - an overview</article-title>
          , in: P. Mehta,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          , M. Mitra (Eds.), Working Notes of FIRE 2018 -
          <article-title>Forum for Information Retrieval Evaluation, Gandhinagar</article-title>
          , India, December 6-
          <issue>9</issue>
          ,
          <year>2018</year>
          , volume
          <volume>2266</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>282</fpage>
          -
          <lpage>290</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2266</volume>
          /
          <fpage>T5</fpage>
          -1.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hürriyetoğlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yörük</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yüret</surname>
          </string-name>
          , Ç. Yoltar,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gürel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Duruşan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Mutlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Akdemir</surname>
          </string-name>
          ,
          <article-title>Overview of clef 2019 lab protestnews: Extracting protests from news in a cross-context setting</article-title>
          , in: F. Crestani,
          <string-name>
            <given-names>M.</given-names>
            <surname>Braschler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Savoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rauber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. Heinatz</given-names>
            <surname>Bürki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cappellato</surname>
          </string-name>
          , N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction</source>
          , Springer International Publishing, Cham,
          <year>2019</year>
          , pp.
          <fpage>425</fpage>
          -
          <lpage>432</lpage>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 0 3 0 - 2 8 5 7 7 - 7</volume>
          _
          <fpage>3</fpage>
          2 .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <article-title>Non-neural Structured Prediction for Event Detection from News in Indian Languages</article-title>
          , in: P. Mehta,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          , M. Mitra (Eds.), Working Notes of FIRE 2020 -
          <article-title>Forum for Information Retrieval Evaluation, Hyderabad</article-title>
          , India,
          <source>December 16-20</source>
          ,
          <year>2020</year>
          , CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B. L. Ritesh</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Ojha, CoMA@FIRE 2020:
          <article-title>Exploring Multilingual Joint Training across diferent Classification Tasks</article-title>
          , in: P. Mehta,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          , M. Mitra (Eds.), Working Notes of FIRE 2020 -
          <article-title>Forum for Information Retrieval Evaluation, Hyderabad</article-title>
          , India,
          <source>December 16-20</source>
          ,
          <year>2020</year>
          , CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shashirekha</surname>
          </string-name>
          ,
          <article-title>An Approach for Event Detection from News in Indian Languages using Linear SVC</article-title>
          , in: P. Mehta,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          , M. Mitra (Eds.), Working Notes of FIRE 2020 -
          <article-title>Forum for Information Retrieval Evaluation, Hyderabad</article-title>
          , India,
          <source>December 16-20</source>
          ,
          <year>2020</year>
          , CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Basak</surname>
          </string-name>
          ,
          <article-title>Event Detection from News in Indian Languages Using Similarity Based Pattern Finding Approach</article-title>
          , in: P. Mehta,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          , M. Mitra (Eds.), Working Notes of FIRE 2020 -
          <article-title>Forum for Information Retrieval Evaluation, Hyderabad</article-title>
          , India,
          <source>December 16-20</source>
          ,
          <year>2020</year>
          , CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>