<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.1016/j.knosys.2016.07.013</article-id>
      <title-group>
        <article-title>A Comparative Study on Generalizability of Information Extraction Models on Protest News</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Erkan Basar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simge Ekiz</string-name>
          <email>sekizg@floodtags.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antal van den Bosch</string-name>
          <email>a.vandenbosch@let.ru.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre for Language Studies, Radboud University</institution>
          ,
          <addr-line>P.O. Box 9103, 6500 HD Nijmegen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>FloodTags</institution>
          ,
          <addr-line>Binckhorstlaan 36, M2.11, 2516 BE, The Hague</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>1</volume>
      <abstract>
        <p>Information Extraction applications can help social scientists to obtain necessary information to understand the reasons behind certain social dynamics. Many recent state-of-the-art information extraction approaches are based on supervised machine learning which can recognize information that has similar patterns with previously shown ones. Recognizing relevant information with never-shown patterns, however, is still a challenging task. In this study, we design a Recurrent Neural Network (RNN) architecture employing ELMo embeddings and Residual Bidirectional Long-Short Term Memory layers to overcome this challenge in the context of CLEF 2019 ProtestNews shared task. Furthermore, we train a classical Conditional Random Fields (CRF) model as our strong baseline to display a contrast between a state-of-the-art classical machine learning approach and a recent neural network method both in performance and in generalizability. We show that RNN model outperforms classical CRF model and shows a better promise on generalizability.</p>
      </abstract>
      <kwd-group>
        <kwd>information extraction recurrent neural networks conditional random elds word embeddings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Social science studies can bene t from analyzing and comparing protest event
information from multiple countries to understand the reasons behind certain
social dynamics such as emerging welfare regimes. Although, online mass media
agencies report the major incidents as soon as in a day, manually collecting such
amounts of data is time-taking, expensive and hard to maintain. Automating
the process with natural language processing (NLP) allows us to harness such
information on a large scale with a high speed.</p>
      <p>
        Many recent state-of-the-art NLP approaches are based on supervised
machine learning technique where the machine discovers patterns on manually
prepared gold-standard data to learn how to detect relevant information. With the
recent developments, it is possible to teach information extraction models to
detect information that has similar patterns to the training data at reasonable
levels. However, generating a model that can correctly recognize relevant data
with never-observed patterns is relatively a challenging task. Generalizability
of a model especially a ects the results when used on cross-cultural text. Even
though the text is written in the same language on the model trained, the
cultural di erences can a ect the context and the language usage such as the word
choices and ordering. This issue can cause semantic bias[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], potentially neglect
the model, render it useless, and thus, eventually force us to generate models
speci c to cultures and regions.
      </p>
      <p>CLEF organizers prepared the ProtestNews Task 3:Event information
extraction shared task[14], stressing both the importance of automatically extracting
protest event information from news and the impact of the generalizability in
natural language processing applications. The aim of the shared task is to
develop generalizable NLP tools that is robust enough to be used regardless of
their data source with a focus on detecting and extracting relevant information
on protest news.</p>
      <p>We design a Recurrent Neural Network (RNN) using Residual Bidirectional
Long Short-Term Memory (BiLSTM) layers with pretrained Embeddings for
Language Models (ELMo) word embeddings[20] and a classical Conditional
Random Fields (CRF)[15] based machine learning model. We accept the CRF based
model as a strong baseline and compare both the performances and the
generalizabilities of two models. We display a contrast between one of the classical
machine learning approaches and one of the latest methods as well as we aim for
this automation to be as accurate as possible, knowing that our methods will
probably not be as precise as human annotation.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        Information Extraction (IE) is the task of automatically extracting structured
information from unstructured text and studied as part of the natural language
processing area [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Previous studies demonstrated that end-to-end information
extraction systems can be developed to analyze news media data by employing
probabilistic approaches [
        <xref ref-type="bibr" rid="ref10">23, 10</xref>
        ]. In the ProtestNews shared task, the event
information extraction is designed as an entity sequence labelling task. The goal
of the sequence labelling can be described as labelling the sequences of relevant
words in text by a single categorical class.
      </p>
      <p>
        A classical machine learning framework for labelling sequential data is
Linearchain Conditional Random Fields [15]. CRF prevents the label bias problem that
occurs when there is an uncertainty in the previous tag of the sequence [19]. The
strength of the CRF is also coming from the ability of dealing with the arbitrary,
overlapping features of the input [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. CRF is accepted as one of the
state-of-theart approaches and CRF-based models are applied to sequence labelling tasks[
        <xref ref-type="bibr" rid="ref8">24,
8</xref>
        ] including some of the state-of-the-art named entity recognition tools[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        In the recent years, studies have shown promise with Recurrent Neural
Networks [16]. Long-Short Term Memories (LSTM) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] based RNNs can learn
temporal dependence between the sequences and also when to forget them. Moreover,
Bidirectional LSTM [13] based networks are reading the data once from the
beginning to the end and once from the end to the beginning and this making them
learn stronger relations. Furthermore, neural networks started to be used in
feature extraction to generate word representation that is e ective in supervised
sequence labelling problems [22]. In common state-of-the-art approaches, word
representations are generated over individual tokens [17]. ELMo [20] is one of
the recent and state-of-the-art word representation network that generates
embeddings over the entire word sequences instead of individual tokens, providing
an advantage in sequence based tasks.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Dataset</title>
      <p>In this study, we only use the dataset provided by the ProtestNews shared task
organizers in our experiments. The provided data is already separated as
training, development, respectively containing 250 and 36 English news sentences
from newspapers published in India. In order to test the generalizability of the
models, two separate unlabelled test data is provided. The main test set
contains 80 English news sentences from India, and the secondary test set contains
39 English news sentences from China. As shown in Figure 1, the average
sentence lengths both in training and test sets are around 50. However, while the
maximum length in the training data is 440, test data contained two sentences
with lengths 579 and 643.</p>
      <p>Likewise, the whole dataset is pre-tokenized and shared in standard CONLL
format. Moreover, there are 8 di erent classes in the dataset named as
participant, trigger, loc, place, etime, fname, organizer, and target given in
beginninginside-outside (BIO) tagging format. The sample sizes of the classes over the
complete entities varies as shown in Figure 2. While the trigger class is used 970
times while place class used 318 times and the least used class is fname with 128
labels.</p>
      <p>Fig. 2: The distribution of the label usage in training data. The graph shows
a count of the labels per complete entity, although the data received in BIO
tagging format. The uninformative outside class is disregarded.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Methods</title>
      <p>Hereunder, we describe the two models we trained for the task and evaluation
process.
4.1</p>
      <sec id="sec-4-1">
        <title>Conditional Random Fields Algorithm</title>
        <p>
          We employ Conditional Random Fields by using the Python binding of
CRFSuite library [18]. The CRF model is trained on the given training data and the
hyperparameters are optimized on the given development set by using Random
Search method [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], especially to optimize the regularization parameters known
as \C" parameters.
        </p>
        <p>In classical supervised machine learning, the features have a great impact on
the classi cation accuracy. For the feature extraction, we use a sliding window
with a length of 5 tokens meaning that if we are extracting features of a token
at position i, the sliding window will contain tokeni-2, tokeni-1, tokeni, tokeni+1,
and tokeni+2. Besides getting the features of individual tokens in the sliding
window independently, we also include bi-grams and tri-grams of the tokens and
most of the features.</p>
        <p>
          The features extracted to train the CRF model is as listed below;
{ Context: Token of the focus and its surroundings in a sliding window and
n-gram combinations.
{ Lemmas: Lemmatized version of the focus and its surroundings in a sliding
window and n-gram combinations, obtained by using spaCy [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] NLP tool.
{ Orthographic Types: The orthographic types of the tokens in a sliding
window and n-gram combinations.
{ Part-of-Speech: Part-of-speech tags of the tokens in a sliding window and
n-gram combinations, obtained by using spaCy [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] NLP tool.
{ Temporal Tags: Temporal tags of the tokens in a sliding window and n-gram
combinations, obtained by using HeidelTime temporal tagging tool [21].
{ Named Entities: Named entity tags of the tokens in a sliding window and
n-gram combinations, obtained by using spaCy [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] NLP tool.
{ isCapital: A boolean value indicates whether the rst letter of the tokeni is
capitalised or not.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Recurrent Neural Network Approach</title>
        <p>We propose a Recurrent Neural Network architecture brie y consists of the
ELMo word embeddings network, two residual BiLSTM layers and a time
distributed dense layer with softmax activation at the end, as shown in Figure
3.</p>
        <p>
          We use the ELMo network pretrained on 1 Billion Word Benchmark [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and
distributed in Tensor ow Hub1. Thus, ELMo is left out in our training process
and used only as a feature extraction layer. ELMo embeddings are generated over
the entire word sequences, unlike the other popular embeddings. Consequently
the input layer of our network requires tokenized sentences as sequences. We set
1 Tensor ow Hub: https://www.tensor ow.org/hub, Accessed on 24/05/2019
ELMo to return word embeddings of 1024 dimensions for an inputted sentence
sequence. Then, the embeddings are fed to the residually connected BiLSTM
layers, each with 512 units and 0.2 dropout rate.
        </p>
        <p>We use Adam optimizer with the learning rate of 0:001 to nd the global
minimum for the sparse categorical cross-entropy loss function. We observe the
relation between epochs and training and validation losses. Thus, we initially
train our network for 30 epochs. After the 10th epoch, however, we do not observe
any di erence in the validation accuracy. The learning rate, dropout rate, units,
and dimensions are decided heuristically.</p>
        <p>
          Implementation of the network and the loss function is done in Keras (v2.2.4)
[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] with the Tensor ow (v1.14.1) [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] back-end.
4.3
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Evaluation</title>
        <p>The evaluation of the algorithms are done in the submission platform of the
ProtestNews shared task [14]. The evaluation metrics used are precision, recall
and f1-score.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>On the primary test set, we observe a macro average f1-score of 37.64 with
CRF-based model and 55.75 with RNN model, as shown in Figure 4. However,
we also see that CRF displays a higher precision score than RNN while RNN
outperforms CRF model on the recall. Thus, RNN model displays a more
balanced performance than CRF. On the secondary test set, the performance of
CRF model signi cantly drops and RNN shows dominance over CRF at each
scoring.</p>
      <p>In Figure 5, we observe that both of the models has a performance loss
when tested on the secondary model. We see that the performance decrease of
the CRF based method is highly visible on the second test set. However, RNN
based method compensate this performance drop better.</p>
      <p>As an overall, our best performing model gives 55.75 f1-score on average.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In this study we have proposed a Recurrent Neural Network based approach
to extract information on protest news. Furthermore, we have built a classical
Conditional Random Fields as our strong baseline to display a contrast between
a classical machine learning approach and a more recent method both in
performance and in generalizability.</p>
      <p>We have seen that the Recurrent Neural Network based model signi cantly
outperforms Conditional Random Fields based approach both in the primary
set and the secondary set. On a setup to test generalizability of each model,
we have shown that the CRF based model demonstrates our initial claims with
Fig. 4: Results of proposed methods on both test sets. India-English test set
referred as primary test set. China-English test set referred as secondary test set
generalizability problem. While both of the models lost performance on the
secondary test set, we conclude that RNN based approach shows a better promise
on generalizability, compared to the CRF method.</p>
      <p>In future studies, we would like to include character embedding, and evaluate
whether including character-level information is going to improve the results of
RNN based model.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abadi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barham</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brevdo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Citro</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghemawat</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harp</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Irving</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Isard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jozefowicz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kudlur</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levenberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mane</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monga</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murray</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olah</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schuster</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shlens</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steiner</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Talwar</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tucker</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanhoucke</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasudevan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Viegas</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warden</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wattenberg</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wicke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>TensorFlow: Large-scale machine learning on heterogeneous systems (</article-title>
          <year>2015</year>
          ), http://tensor ow.org/, software available from tensor ow.org
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bergstra</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Random search for hyper-parameter optimization</article-title>
          .
          <source>Journal of Machine Learning Research 13(Feb)</source>
          ,
          <volume>281</volume>
          {
          <fpage>305</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Caliskan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bryson</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narayanan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Semantics derived automatically from language corpora contain human-like biases</article-title>
          .
          <source>Science</source>
          <volume>356</volume>
          (
          <issue>6334</issue>
          ),
          <volume>183</volume>
          {
          <fpage>186</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chelba</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schuster</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ge</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brants</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koehn</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>One billion word benchmark for measuring progress in statistical language modeling</article-title>
          .
          <source>Computing Research Repository (CoRR) abs/1312.3005</source>
          ,
          <issue>1</issue>
          {
          <issue>6</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chollet</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , et al.: Keras. https://keras.io (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cowie</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehnert</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Information extraction</article-title>
          .
          <source>Commun. ACM</source>
          <volume>39</volume>
          (
          <issue>1</issue>
          ),
          <volume>80</volume>
          {91 (Jan
          <year>1996</year>
          ). https://doi.org/10.1145/234173.234209, http://doi.acm.
          <source>org/10</source>
          .1145/234173.234209
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Culotta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Con dence estimation for information extraction</article-title>
          .
          <source>In: Proceedings of HLT-NAACL 2004: Short Papers</source>
          . pp.
          <volume>109</volume>
          {
          <fpage>112</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Cuong</surname>
            ,
            <given-names>N.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chandrasekaran</surname>
            ,
            <given-names>M.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>M.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>W.S.:</given-names>
          </string-name>
          <article-title>Scholarly document information extraction using extensible features for e cient higher order semi-crfs</article-title>
          .
          <source>In: Proceedings of the 15th ACM/IEEE-CS joint conference on digital libraries</source>
          . pp.
          <volume>61</volume>
          {
          <fpage>64</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Finkel</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grenager</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Incorporating non-local information into information extraction systems by gibbs sampling</article-title>
          .
          <source>In: Proceedings of the 43rd annual meeting on association for computational linguistics</source>
          . pp.
          <volume>363</volume>
          {
          <fpage>370</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Strotgen, J.,
          <string-name>
            <surname>Berberich</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Eventminer: Mining events from annotated documents</article-title>
          .
          <source>In: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval</source>
          . pp.
          <volume>261</volume>
          {
          <fpage>270</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural Comput</source>
          .
          <volume>9</volume>
          (
          <issue>8</issue>
          ),
          <volume>1735</volume>
          {1780 (Nov
          <year>1997</year>
          ). https://doi.org/10.1162/neco.
          <year>1997</year>
          .
          <volume>9</volume>
          .8.1735, http://dx.doi.org/10.1162/neco.
          <year>1997</year>
          .
          <volume>9</volume>
          .8.
          <fpage>1735</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Honnibal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montani</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <article-title>: spacy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing</article-title>
          . To appear (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>