<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Event Sentence Detection Task Using Attention Model*</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ali Safaya</string-name>
          <email>alisafaya@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Sakarya University</institution>
          ,
          <addr-line>Adapazarı, Sakarya 54055</addr-line>
          ,
          <country country="TR">Turkey</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>This paper describes and evaluates a model for event sentence detection in news articles using Attention Models with Bidirectional Gated Recurrent Network (GRU) and Word Embeddings. This model was developed for event sentence detection task in the competition that was organized by ProtestNews lab at CLEF 2019. We also evaluated the generalizability of NLP tools by training our model on data from one country and testing it on data from another country. The model was developed for this task was shown to have the highest score in the organized competition with average F1-score of 0.6547.</p>
      </abstract>
      <kwd-group>
        <kwd>Information extraction</kwd>
        <kwd>Natural language processing</kwd>
        <kwd>Sequence classification</kwd>
        <kwd>Event sentence detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        This task aims to identifying and labeling sentences that contain protest events in news
articles. It follows the document labeling task which identifies news articles that contain
protest events as identified in the Event Labeling Annotation Manual [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Once the
news reports are classified as containing a protest event, what remains is to identify
where in the article the relevant event information is presented. In terms of this task,
we will analyze the sentences of the protest news articles one by one and classify them
as event-sentence vs. non-event-sentence.
      </p>
      <p>Event sentences, those that are labeled as 1, should contain an explicit reference to
any protest event that makes the document eligible for being classified as a protest
article. Such reference can be any word or phrase which denotes the said event. They can
be direct expressions of the event or the pronouns which stand for the event. The
sentence must clearly indicate that the event in question has definitely happened in the past
or is an ongoing event.</p>
      <p>Non-event sentences, i.e. those that are labeled as 0, are the ones which does not
contain any event reference in the past, the present or the future.</p>
      <p>
        The main goal for this task is to set a baseline in evaluating generalizability of the
NLP tools. The setting was proposed facilitates testing and improving state-of-the-art
methods for text classification and information extraction on English news article texts
from India and China. The direction of ProtestNews lab work is towards developing
generalizable information systems that perform comparatively well on texts from
multiple countries [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Data Collection and Methodology</title>
      <p>
        ProtestNews lab organizing committee have collected online English news articles from
India and China. The annotation process started by labeling articles in a sample of news
articles as containing protest or not which will be used for Task 1. Sentences of these
positively marked documents are then labeled as containing protest information or not.
These sentences should contain either an event trigger or a reference to an event trigger
in order to be labeled as positive [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Our deep learning based model was trained using Training-India set and Validated
on Validation-India set, the data retrieved from China was not involved in training at
all, so when the model was evaluated on the test sets we could obtain independent and
generalized results.</p>
      <p>F1-score metric (1), was used in the evaluation process for this task, as it gives more
accurate assessment results for this kind of tasks where there is non-equal number of
negative and positive samples.</p>
      <p>1 =
2 × 

+ 
× 
(1)
2.1</p>
      <sec id="sec-2-1">
        <title>Preprocessing and Tokenization</title>
        <p>Before feeding data into the model, our text samples which are sentences taken from
articles were to be cleaned and parsed into lower case words.</p>
        <p>Because word embeddings were used in the classification process, a word index had
to be created according to the embeddings set in use and once word sequences were
obtained, some of irrelevant words had to be dropped and those words were determined
by the word index.</p>
        <p>Also the sequences had to have fixed length of tokens, so sequence length was
limited to 35 tokens. Longer sequences where truncated and shorter sequences were
prepadded with 0 indexed token in order to reach 35 tokens.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Word Embeddings</title>
        <p>
          To feed those word sequences to our deep learning model every token had to be
represented by some value or vector. In this model word based representations were used,
so every word was replaced by embedding vector. This embeddings vector set was
obtained from Google’s pretrained set [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Which was trained using word2vec [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
algorithm on part of Google News database (100 billion words) and contains 300
dimensional vectors for 3 million English words. In this work only the most frequent 400000
vocabulary were used in the word index.
        </p>
        <p>Every token was replaced by 300 dimensional vector and maximum sequence length
was limited to 35 token. So every sentence was represented by matrix of shape (300,
35).
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Bidirectional Gated Recurrent Unit (GRU)</title>
      <p>
        In sequence classification tasks, Recurrent Neural Networks (RNN) and its variations
had always been the state-of-art tool. After obtaining an embedding for each sample,
the main approach will be using bidirectional GRU [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. As Fig. 1 shows, every layer of
Bidirectional GRU, contents of GRU cells for each direction.
Every cell has two gates; an update gate   and reset gate   (Fig. 2). Sigma
representations demonstrate these gates: which allows a GRU to carry information over many
time periods to influence a future time zone.
Attention Models were firstly represented in 2015 by Dzmitry Bahdanau et al [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Past
conventional methods used to find features from the text by doing a keyword extraction
and some words are more helpful in determining the category of a text than others.
However, in this method the sequential structure of the text is not fully used. With deep
learning methods, while we can take care of the sequence structure, the ability to give
higher weight to more important words is lost.
      </p>
      <p>
        The firstly proposed model was meant for Machine Translation purposes, while
using Attention Models mechanism for text classification tasks was proposed in the paper
written jointly by CMU and Microsoft in 2016 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In author’s words:
Not all words contribute equally to the representation of the sentence meaning. Hence,
we introduce attention mechanism to extract such words that are important to the
meaning of the sentence and aggregate the representation of those informative words
to form a sentence vector
      </p>
      <p>In this model (see Fig. 3) the Attention layer is added after the last GRU layer. So
the Attention Models output is the dot product of Attention Similarity Vector si and
GRU cells output ai.
1 http://colah.github.io/posts/2015-08-Understanding-LSTMs/ last accessed on June 2019</p>
      <p>The main goal is to create scores (si) for every word in the text, which is the attention
similarity score for a word. Here in Fig. 4, we could see how those scores are calculated.
2 https://mlwhiz.com/blog/2019/03/09/deeplearning_architectures_text_classification/ last
accessed on June 2019</p>
      <p>
        These final scores are then multiplied by GRU output for words to weight them
according to their importance. After which the outputs are summed and sent through
dense layers and then to the last output function [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
5
      </p>
    </sec>
    <sec id="sec-4">
      <title>Modeling The Network and Evaluation</title>
      <p>Machine learning model is shown in Fig. 4. After Embedding layer two Bidirectional
GRU layers are introduced, with 128, 64 cells respectively. On the top of them an
Attention with Context layer was added and followed by dense layer of 64 nodes with
ReLU as their activation function. Finally an output layer was added with one node
containing sigmoid function for binary classification output.</p>
      <p>
        The model was trained on Training-India dataset (see Table 1) for 8 epochs using
Nadam [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] as optimizer function and validated through Validation-India dataset.
      </p>
      <p>While testing the model on test datasets it could be observed (as in Table 2) that
performance (F1-score) dropped from 0.70 on Test-India dataset which is the same
source that Training data was obtained, to 0.60 on Test-China dataset.
(*) Was kept hidden by the commission of the lab.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this task we worked on deep learning model which tries to classify and detect event
sentences in news articles. The proposed model uses Bidirectional GRU with Attention
Models. The results obtained from this model were the highest in the competition which
had been organized by ProtestNews Lab.</p>
      <p>With this experiment we could observe the effect of local data on NLP tools, our test
results on datasets from the same source of training sets were noticeably higher than
those on datasets from other sources.</p>
      <p>For further work, we could evaluate POS based features of the words in the sentences
by adding one more input layer in parallel with embedding layer.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. ProtestNews lab Homepage, https://emw.ku.edu.tr/clef-protestnews-2019/, last accessed
          <volume>23</volume>
          .05.
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Hürriyetoğlu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yörük</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yüret</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Yoltar, Ç.,
          <string-name>
            <surname>Gürel</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duruşan</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Mutlu</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          (
          <year>2019</year>
          , April).
          <article-title>A Task Set Proposal for Automatic Protest Information Collection Across Multiple Countries</article-title>
          .
          <source>In European Conference on Information Retrieval</source>
          (pp.
          <fpage>316</fpage>
          -
          <lpage>323</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Google</given-names>
            <surname>Code</surname>
          </string-name>
          <article-title>word2vec page</article-title>
          , https://code.google.com/archive/p/word2vec/,
          <source>last accessed 24.05</source>
          .
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G. S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Efficient estimation of word representations in vector space</article-title>
          .
          <source>In Proceedings of Workshop at ICLR</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Neural Machine Translation By Jointly Learning To Align And Translate</article-title>
          .
          <source>In Proceedings of ICLR</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smola</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <article-title>Hierarchical Attention Networks for Document Classification</article-title>
          . Carnegie Mellon University, Microsoft Research, Redmond
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. MLWhiz, NLP Learning Series: Part 3
          <article-title>-Attention, CNN and what not for Text Classification</article-title>
          , https://mlwhiz.com/blog/2019/03/09/deeplearning_architectures_text_classification/ ,
          <source>last accessed 23.05</source>
          .2019
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Dozat</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <source>Incorporating Nesterov Momentum into Adam</source>
          , http://cs229.stanford.
          <source>edu/proj2015/054_report.pdf, last accessed 24.05</source>
          .2019
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>