<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CLEF ProtestNews Lab 2019: Contextualized Word Embeddings for Event Sentence Detection and Event Extraction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gabriella Skitalinskaya</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jonas Kla</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maximilian Spliethover</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Bremen</institution>
          ,
          <addr-line>28359 Bremen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>01</volume>
      <fpage>504</fpage>
      <lpage>508</lpage>
      <abstract>
        <p>In this work we describe our results achieved in the ProtestNews Lab at CLEF 2019. To tackle the problems of event sentence detection and event extraction we decided to use contextualized string embeddings. The models were trained on a data corpus collected from Indian news sources, but evaluated on data obtained from news sources from other countries as well, such as China. Our models have obtained competitive results and have scored 3rd in the event sentence detection task and 1st in the event extraction task based on average F1-scores for di erent test datasets.</p>
      </abstract>
      <kwd-group>
        <kwd>Contextualized String Embeddings Classi cation Named Entity Recognition</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Automated protest news mining can play a great role in analyzing and
understanding protests and their media coverage, especially on a global scale. Such
research may be able to support di erent research domains by capturing the
protest's evolution over time and identifying the origins of riots and social
movements. Additionally, by analyzing news sources from a wide range of countries
we can get a better understanding of the worldwide media coverage of protest
events.</p>
      <p>The CLEF-2019 ProtestNews! Lab [19] tries to tackle this problem and has
introduced three shared tasks aimed at identifying and extracting event
information from news articles across multiple countries. The aim of the shared tasks
is the development of a generalizeable text classi cation and information
extraction tool that could be applied to datasets from di erent countries without
additional training.</p>
      <p>The rst task can be described as a binary classi cation task aimed at
discriminating between news articles related to protest events and any other news
articles. In the second task, the tool should be able to determine whether a
sentence is an event sentence, i.e. contains an event trigger or a mention of it.
Finally, the third task is a named entity recognition (NER) task focused on
extracting various types of information from a given event sentence such as
location, time and participants of an event. In this paper we will only cover the
second and third tasks.</p>
      <p>For every task a set of news articles from one country (India) was provided
with a prede ned training and development split. The resulting models were
then evaluated on news articles from the same country as in the training set and
on an additional set containing data from another country(China).</p>
      <p>The rest of this paper is organized as follows. We discuss relevant literature
in Section 2. Section 3 gives details on the training dataset and the description
of the proposed approaches. Section 4 provides experimental evaluation, and
important insights gained during our work. We conclude in Section 5, outlining
our contributions and directions for future research.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>Various Natural Language Processing (NLP) techniques have been utilized in
research to automatically analyze and extract information about events from
free-form texts focusing on di erent types of events and pursuing di erent goals.
We would like to give a few examples of typical approaches used to address the
problems of our interest. The authors of [7] use a Conditional Random Fields
(CRF) model to evaluate social media posts of a big event in order to gather
information about smaller sub-events, that are collocated to the more popular one.
The authors do not focus on political events, but try to nd a more generalizable
approach, applicable to multiple types of events.</p>
      <p>In contrast to that, [8] focuses on the analysis of activism. Their main
approach is to extract event information from natural language text and visualize
it afterwards. Instead of social media posts, as in [7], [8] utilize news articles
from media outlets.</p>
      <p>In [8] the authors use a simple count of certain linguistic features to
determine if a sentence is relevant. Similarly, [9] creates simple hand-crafted rules
for di erent NLP tags (like part-of-speech and named-entity tags) to classify
sequences into protest relevant or not.</p>
      <p>The authors of [10] present an actual use case by using the gathered
information to create a protest/demonstration forecast system that is able to predict
the occurrences of planned protests by analyzing \open-source documents that
appear to indicate civil unrest event planning". They apply simple statistical
models to do phrase ltering and Probabilistic Soft Logic to identify
geographical information.</p>
      <p>It can be seen that most of the considered approaches use simple term
representations, that do not take into account the context of the term or only do
so for the train set, which means that terms that never occurred in the training
set will always have a zero-vector. To achieve a better approximation for such
words, the authors of [14] train their model to generate representations for parts
of words. The idea is to better incorporate subword information and to be able
to generate encodings for out-of-vocabulary terms by combining the encoding of
di erent parts of the word.</p>
      <p>Distributed term representations (word embeddings) and a Bi-LSTM have
often been used in recent research to solve the task of sequence tagging and
classi cation (see for example [13, 17]). Due to the success of this combination
and improvements to the contextualized word embeddings in the recent years,
we have chosen to use the air embeddings and their language model for all tasks
in consideration. The chosen approaches will be described in more detail in the
following sections.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <sec id="sec-3-1">
        <title>Dataset</title>
        <p>The provided datasets are individual for each task and consist of newspaper
articles, taken from Indian and Chinese online-newspapers. The training data
consists of news articles from Indian sources whereas the test data is represented
by two sets, one containing Indian news sources (test) and the other Chinese news
sources (test china). All datasets are in the English language.</p>
        <p>In the case of Task 2, the provided training dataset is imbalanced, 988
sentences have been tagged as protest-related and 4897 as not. Detailed descriptive
statistics for each dataset can be found in Table 1.</p>
        <p>For Task 3 the provided train dataset consists of 21623 labeled tokens. The
tokens were labeled according to the BIO labeling scheme were a (B) label
indicates the rst token of an entity, an (I) label all following tokens and an (O)
label all tokens, outside of entities. Entities of interest included: `participant',
`trigger', `loc', `place', `etime', `fname', `target'. Full explanations on the what is
considered in each entity type can be found in [19].
In the framework of the ProtestNews Labs we wanted to evaluate how well
contextual string embeddings perform in sequence labeling and classifying
protestrelated news, and whether the models trained on data from one country can be
applied to data from other countries.</p>
        <p>Task 2. In this task the goal was to classify whether the above mentioned
sentences contain an event-trigger or not. To solve this task we chose an approach
using contextualized distributed term representations [13] to represent the input
text. For this purpose, we stacked di erent traditional word embeddings such
as GloVe [12] and FastText [14] together with the contextualized embeddings
generated from Flair language models (LM), as suggested by [13]. Flair LMs
are character-level Bi-LSTMs, pre-trained on the task of predicting the most
probable next character in a sequence of characters. Therefore, they encode in
their representation all previous and all following words from the given input
sequence. In our work, we used the Flair-LMs pre-trained on a news corpus
(news-forward-fast) and the corresponding inverted corpus(news-backward-fast).
We choose the described approach for its ability to capture the context of the
input, which proved to be useful for this speci c task.</p>
        <p>In the next step, we used the generated representations for every word in
the given input sequence to derive a vector representation for the whole input
sequence by using LSTM-based document embeddings [6]. In contrast to pooled
document embeddings which (by default) represent the document by
averaging all word representations, LSTM-based document embeddings take the word
vectors as input features and are ne-tuned to the speci c downstream task to
extract the resulting document embedding from the last hidden state after the
ne-tuning [6].</p>
        <p>
          The resulting document embeddings were used to classify every input
sequence as containing an event-trigger (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) or not (0). The classi cation itself
was performed by a linear transformation, using a single linear layer with the
dimension of the resulting document embeddings.
        </p>
        <p>We tested both the original as well as preprocessed input sequences, where the
preprocessing steps included stopword removal, removal of named entities, such
as date and time and removal of short words. Since the preprocessed sequences
lead to a signi cant performance drop, we ne-tuned the Flair-LM on the original
texts.</p>
        <p>Task 3.The aim of this task is to develop a generalized model to extract
event related information, such as location, time and participants, from given
sentences. The sentences are, as above mentioned, taken from online newspapers.
The task is framed as a named entity recognition (NER) task and therefore a
token labeling problem.</p>
        <p>For task 3 we chose pooled contextualized embeddings [15] for their better
performance compared to the non-pooled version. While the standard version of
the contextualized air embeddings does only account for the context of a token
per sentence, the pooled embeddings combine the contexts of all usages of the
term in the input and concatenates the resulting vector with the contextualized
vector for the sequence of interest. The process of generating the embeddings is
described in detail in [15].</p>
        <p>In both approaches models were trained using the provided train set and
validated on the dev data.
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <sec id="sec-4-1">
        <title>Task 2. Event sentence detection</title>
        <p>In the framework of the sentence event detection task we have submitted two runs
experimenting with di erent minibatch sizes. In both runs we have used
LSTMbased document embeddings built with stacked Glove and contextualized string
embeddings. In our experiments, we used the hyperparameter settings proposed
by the authors in [13]. The only di erence between Run 1 and Run 2 is in the
minibatch size, which has been set to 16 and 8 respectively.</p>
        <p>The results obtained by our runs for each dataset as well as the best results
in the track and baseline provided by the organizers are presented in Table
2. According to [19], a Linear Support Vector Classi cation with a stochastic
gradient descent learning model was selected as a baseline model. For the o cial
testing phase the average of F-scores obtained for each task was used as the
performance measure. The second submission was used as our nal submission,
positioning us in third place.</p>
        <p>When comparing the results achieved by the rst and second run, a decrease
in quality of classi cation with the increase in minibatch size can be observed.
This may be explained by the following. In cases where models tend to over t, the
gradients calculated with a small batch size are much more noisy than gradients
calculated with large batch size, so it is easier for the model to escape from sharp
minimizers, and thus leads to a better generalization [16].</p>
        <p>One of the goals of the ProtestNews Labs track is to build a model able to
generalize outside of the country domain used for training, making it possible
to use the the same model to detect event sentences in news sources from other
countries. Thus, it is interesting to see not only how well the model performs on
data of the two considered countries, but also how big the di erence between the
achieved results is. In Table 2 it can be seen that in Run 2, the gap between the
India test score and China test score is the lowest, which can indicate a higher
cross-country generalization ability of the proposed model.
We have submitted 2 runs experimenting with di erent standard word
embeddings, which were stacked with pooled contextualized string embeddings, as
recommended in [15]. Using these embeddings we trained our own Sequence Tagging
Models. In the rst run we have used FastText embeddings [14], whereas in the
second run we tried the Glove embeddings [12]. In our experiments, we use the
hyperparameter settings recommended by the authors in [15] as they achieved
state-of-the-art performance in other natural language processing tasks.</p>
        <p>The results obtained by our runs for each dataset are presented in Table
3. During the testing phase the average of F-scores obtained for each dataset
was used as the performance measure. The second submission was used as our
nal submission and landed us the rst place. It can be seen that there is a
considerable di erence in the results obtained for di erent countries.
In this paper we tackled the problem of event sentence detection and event
tagging in protest-related news articles at the CLEF ProtestNews Lab. The
proposed solutions were based on using contextualized string embeddings. We
achieved the best F-score in extracting relevant information from event-related
sentences, and the third-best F-score in classifying sentences from news articles.</p>
        <p>The improvement of the generalization ability of the approach will be the
main focus of our future work. We will try other embeddings such as bert[18] to
further investigate if an attention based incorporation of the context improves
the performance.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. BeautifulSoup - bs4, https://www.crummy.com/software/BeautifulSoup/.
          <source>Last accessed 21 May 2019</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Natural</given-names>
            <surname>Language</surname>
          </string-name>
          <string-name>
            <surname>Toolkit</surname>
          </string-name>
          , https://www.nltk.org/.
          <source>Last accessed 21 May 2019</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Explosion</surname>
            <given-names>AI</given-names>
          </string-name>
          - spaCy, https://www.spacy.io.
          <source>Last accessed 21 May 2019</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Annotation</given-names>
            <surname>Speci</surname>
          </string-name>
          cations - Named Entities, https://spacy.io/api/annotation#namedentities,
          <source>Last accessed 21 May 2019</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>