=Paper=
{{Paper
|id=Vol-2621/CIRCLE20_37
|storemode=property
|title=Event Detection and Time Series Alignment to Improve Stock Market Forecasting
|pdfUrl=https://ceur-ws.org/Vol-2621/CIRCLE20_37.pdf
|volume=Vol-2621
|authors=Elliot Maitre, Zakaria Chemli,Max Chevalier,Bernard Dousset,Jean-Philippe Gitto,Olivier Teste
|dblpUrl=https://dblp.org/rec/conf/circle/MaitreCCDGT20
}}
==Event Detection and Time Series Alignment to Improve Stock Market Forecasting==
      Event detection and time series alignment to improve stock
                         market forecasting
                   Elliot Maître                                        Zakaria Chemli                                Max Chevalier
    Institut de Recherche en Informatique                                   Scalian                       Institut de Recherche en Informatique
             de Toulouse / Scalian                                        Paris, France                                 de Toulouse
               Toulouse, France                                   zakaria.chemli@scalian.com                         Toulouse, France
              elliot.maitre@irit.fr                                                                                max.chevalier@irit.fr
                Bernard Dousset                                      Jean-Philippe Gitto                               Olivier Teste
    Institut de Recherche en Informatique                                  Scalian                        Institut de Recherche en Informatique
                  de Toulouse                                          Blagnac, France                                  de Toulouse
               Toulouse, France                                jean-philippe.gitto@scalian.com                       Toulouse, France
            bernard.dousset@irit.fr                                                                                 olivier.teste@irit.fr
ABSTRACT                                                                             time series forecasting using textual information is a challenging
Buying commodities is a critical issue for multiple industries be-                   research issue [30].
cause the variations of stock prices are induced not only by multiple                   In order to extract text data, multiple sources can be considered.
economic parameters but also by external events. Raw material                        An important one is micro-blogging. Several studies showed the
buyers must keep track of information in numerous fields, which                      predictive power of such media [23]. Sentiment analysis on Twitter
constitutes a major challenge considering the exponential growth                     can be helpful [2], the activity on social network can be correlated
of online data. To tackle this issue, we propose an event detec-                     with variation of the stock [26] and Twitter data can be used to
tion approach in order to assist them in their anticipation process.                 forecast polls that are then used to interpret stock variations [22].
Indeed, a lot of contextual information is contained in text and                     Specialized financial website, such as Seeking Alpha, where com-
exploiting it can allow one to improve its anticipation ability. Thus,               munities of traders share their insights about the stock market,
we develop a framework of event detection and qualification, then                    also contains meaningful information for stock market forecasting
we quantify the impact of these events on stock market to help                       [5]. Thus, multiple sources of information like micro-blogging and
buyers in their anticipation process. In this paper, we will first intro-            specialized community websites can be combined to improve stock
duce our context, then explain the scope of our work and our goals.                  market forecasting.
After detailing the related work, we will present our proposition,                      Leveraging the expertise of several buyers via multiple inter-
conclude and propose some future work possibilities.                                 views, we observed that they base their decisions on events happen-
                                                                                     ing in the real world, related by newspapers and social networks.
CCS CONCEPTS                                                                         Hence, given that the stock market reacts to news and events [8],
                                                                                     we will particularly focus on event detection in text. Indeed, some
• Information systems → Data management systems; Informa-
                                                                                     periods are more intense than others [4] and are considered as
tion retrieval; • Computing methodologies → Natural lan-
                                                                                     more important. These periods, characterized by some events, are
guage processing.
                                                                                     carrying more information than other periods. Being able to detect
KEYWORDS                                                                             these events and quantify their impact constitute a major asset
Event detection, text analysis, nlp, neural networks, time series,                   for buyers and traders. It is a difficult task, as illustrated by the
commodities                                                                          impact on the stock market of the Covid-19 outbreak, which was
                                                                                     widely discussed but largely underestimated. With adapted tools,
1    INTRODUCTION                                                                    one could have anticipated this crisis and behaved accordingly in
                                                                                     order to mitigate the impact.
Time series play a major role in several industrial fields, such as
                                                                                        Our research aims at providing a tool leveraging information
energy [1], transport [29], economy [11] or finance [28]. Being able
                                                                                     contained in text data, especially events, in order to assist people
to accurately forecast time series is a major asset in order to an-
                                                                                     in their time series anticipation process, i.e. commodities buyers in
ticipate the modeled phenomenon for companies. In commodities
                                                                                     our context. In this paper we will focus on the event detection step.
buying, the stock market is described by time series and is par-
                                                                                     We will firstly introduce our general work, then we will focus on
ticularly volatile, making its forecasting both a strategical and a
                                                                                     the related work about event detection in text. Afterwards, we will
challenging task [7], [10]. Classic stock forecasting methods like
                                                                                     develop our proposal.
[15] or [24] are usually based on economical data, such as curren-
cies, indices or futures but most of them do not take into account
textual data which can contain precious information. Improving                       2   OVERVIEW OF OUR PROPOSAL
"Copyright © 2020 for this paper by its authors. Use permitted under Creative Com-   The task of commodities price forecasting is particularly complex
mons License Attribution 4.0 International (CC BY 4.0)."                             due to the tremendous amount of parameters that influence the
                                                                                                                                      Maître, et al.
                                                          Figure 1: General approach
variations of the stock. To bring more contextual information to           be able to recognize the word "killed" as a trigger for the event
the buyers and to our model, we want to combine time series with           "Die". Currently, the state-of-the-art for this task is achieved by
text information. This is not a straightforward process and it needs       using neural networks and several approaches have been proposed
to be broke down in sub-tasks. Hence, our work will be articulated         on this base. Nguyen introduced in [20] a CNN-based approach
around three major steps as illustrated by Figure 1 :                      to detect these triggers. In [9], the authors improve this work by
    (1) Time series analysis to find coherent temporal areas,              adding a Bi-LSTM to the CNN in order to include sentence context
    (2) Temporal event extraction,                                         to the detection. The authors of [14] propose a self-regulating GAN
    (3) Events and time-series alignment.                                  to perform the detection. In [18], the authors include even more
                                                                           context by a document-scale approach.
    While these steps are mutually dependent, it is also possible to
treat them separately. Each of them constitute a scientific challenge
and thus will be developed separately [20], [9], [14], [25], [24], [15].   3.2    Topic modeling approaches
In the rest of this paper, we will particularly focus on part (2) which
                                                                           While the former approach is mostly based on semantic and syntac-
is the part we are currently working on and give insights about (3)
                                                                           tic properties, topic modeling approaches are statistical approaches.
which is the next step of our work. Part (1) is currently not in the
                                                                           The authors of [27] propose to use Twitter users as human sensors
scope of this work, we plan to use existing approaches to tackle
                                                                           to detect in real-time earthquake occurrences. The authors are using
this issue.
                                                                           keywords to detect these target events and they use probabilistic
                                                                           models to detect the location of the events. Weng et al., in [31]
3     RELATED WORK
                                                                           analyze the wavelet signal of words in Tweets in order to filter triv-
There are different approaches to perform event detection in text.         ial words and clusters words to detect events. In [17], the authors
The two principal are topic modeling and event trigger detection.          analyze daily topics on Twitter via Latent Dirichlet Analysis (LDA)
The former is a statistical approach while the latter is based on          and then determine similarity between daily topics. They detect
word classification.                                                       bumps in word usage and then clusterizes topics in "eventy topics".
                                                                           The authors of [21] propose a sub-event detection technique using
3.1    Event trigger based approaches                                      topic modeling. This technique detect sub-events linked to an event
The event trigger based approach is a classification method which          and assign a label to these sub-events. In [13], the authors propose
consists in classifying words in event categories. Some words,             a real-time framework to detect minor and major events on Twitter.
named trigger-words, are supposed to trigger the event in the sen-         The first module of the framework detects events and then the
tence and they are carrying the meaning. Detecting and classifying         second module clusterizes these events.
those words hence allow one to understand if a sentence depicts
an event. ACE 2005 [12] is the reference dataset for this task and            Thus, event trigger based approaches tend to exploit the power
has been studied multiple times [20], [9], [14]. According to the          of deep neural networks while topic modeling approaches are based
ACE 2005 annotation guideline, in the sentence "A police officer           on frequency of words and on what is discussed on social networks.
was killed in New Jersey today", an event detection system should          We argue that combining the asset of each technique could be
Event detection and time series alignment to improve stock market forecasting
an interesting objective. The power of representation brought by                   (1) Text data is extracted from sources previously selected by
neural network is complementary to the detection approach of                           buyers, such as trusted Twitter users, in order to gather text
topic modeling.                                                                        written in regular English and focused on sharing important
                                                                                       information. Indeed, most of the content on the internet is
                                                                                       created by a few users.
4     OUR PROPOSAL: EVENT DETECTION                                                (2) In order to have an exploitable event representation, we
      COMBINING TOPIC MODELING AND                                                     embed the content, using word embedding and sentence
      NEURAL MODELS                                                                    embedding.
Several constraints, such as the influence of possibly unknown                     (3) The embedded content is clusterized, leveraging the amount
parameters and the real-time nature, arise from the definition of                      of information the embeddings bring. This can be done by
the stock market. To predict future stock, one must exploit histor-                    placing the embedded content on vertices of a graph and
ical data but also real-time data. Hence, our framework must be                        creating an edge between each vertex, weighted by the dis-
applicable to data stream such as the Twitter stream. Moreover,                        tance between the two embeddings. If the distance is under
some events may not be comparable to past events, so the classifi-                     a certain threshold, the edge is removed in order to create
cation must be able to handle and assign labels to unknown classes.                    clusters of related contents.
However, we do not aim at making real-time commodities trading,                    (4) The clusters are labelized, by determining representative
we want to assist buyers in their daily buying decisions. We only                      document. An example of a representative document is a
want our solution to be applicable in a real-time context, i.e. with a                 document with the minimum average distance with other
granularity sufficient to help buyers in their daily transactions.                     tweets of the cluster.
                                                                                   Thus, the clusters obtained are expected to be of great quality
4.1     Motivations                                                             thanks to a better representation, allowing a better identification
Topic-modeling approaches correspond to our prerequisites, but                  and classification of events. These detected events will have two
some of them are not adapted to data-streams or does not work                   usages : they will be used in the next steps in order to estimate the
with unknown classes. Recent work which satisfies our constraints               variations of the times series, and they will also be given to the
fails to exploit the properties of the language and are only based              buyers in order to help make their decision, alongside with our time
on a probabilistic approach linked with word apparitions.                       series estimation. Since the tweets are extracted from the Twitter
   Neural based approaches, such as the methods used in the trigger-            Stream, we will order them as their apparition order, which allows
based approaches, are powerful in order to exploit patterns dis-                us to take time into account and adapt to the type of application
covered in past data. Moreover, they bring more information by                  we want.
leveraging semantics and syntactic information, with methods such
as word and sentence embeddings.
   Our goal is to exploit these information to improve the quality              4.3    Pros and cons
of event representation. We think that these approaches are com-                This methods brings more information than a regular topic model-
plementary and we assert that combining them will allow us to                   ing approach, leveraging the representation power of neural based
leverage the time and frequency aspect derived from topic model-                approach. It allows us to consider the documents in a time-ordered
ing and the representation power of neural networks, in order to                manner which is not the case in most classification problem. This
optimize event classification.                                                  make it suitable for time-based applications such as our.
                                                                                   However, the efficiency of such a model for unknown events
                                                                                is not certain. Indeed, it is clear that neural networks sometimes
4.2     Our method                                                              fail to generalize correctly. Handling an event containing too much
To do so, we propose a novel approach based on word and sen-                    novelty might be misleading for some models. The time aspect may
tence embeddings. The idea behind this method is to leverage the                also have some impact on the efficiency of the model.
geometric power of these methods. Using the representation ob-                     Moreover, neural based approaches require annotated data, which
tained, similar documents should have similar representations in                is not always available, especially in context such as Twitter where
the embedding space. By comparing the distance between docu-                    the amount of data is huge. This problem has been considered in
ments, we will be able to create clusters of documents. Each cluster            recent work, notably in [19] where the authors propose a weakly-
corresponds to an event. Some events may be related and clusters                supervised approach to limit annotation time. The problem of un-
of similar events might be regrouped in an event cluster. This event            known classes is not appropriately handled by these approaches.
cluster represents a class of events, such as sports events, geopoliti-         Detecting novelty without labeling it could be an insight in order
cal events... Hence, unknown events can be assimilated to events                to detect change in the time series, but in the mean time, we want
in the same event cluster. We will order documents by their appari-             to focus on a method allowing us to label unknown events.
tion time, so we can adapt to the real-world context we want to
apply this method to, i.e. commodities stock estimation using event                Thus, this method helps us bringing more information in order
detection in text data stream.                                                  to fulfill our classification objective, to adapt to our time-dependant
                                                                                context however it may rise several issues that we have not ad-
    Our proposition is articulated as follows:                                  dressed yet.
                                                                                                                                      Maître, et al.
                                                             Figure 2: GAN example
5   LINKING EVENTS AND TIME SERIES                                             Its objectives is to automatically extract information from the
    VARIATIONS TO ESTIMATE FUTURE TIME                                      detected events it takes as input, and link it with the variations in
    SERIES VARIATIONS                                                       the historical time series data.
Following the idea of combining time series and text, the detected
events will be fed to a generative adversarial network (GAN) along          6   CONCLUSION
with time series data, to predict expected variations of the stock          Considering the constraints induced by our context, namely detect-
prices. Figure 2 illustrates the process we will describe. Our in-          ing possibly unknown events in order to help buyers in their daily
tuition is that the GAN will be able to link detected events and            buying decisions, we deduced that a combination of topic-modeling
variations in the time-series. A GAN is composed of two major               approaches and neural based models is a promising method to com-
parts : the generator and the discriminator. The generator try to           plete our task. We propose to embed content using recent models,
mimic the actual data and the discriminator tries to identify fake          i.e. word and sentence embeddings, in order to produce a better
data produced by the generator. We want to produce time series              clusterization leveraging the representation power of these models
estimations, so our solution is articulated as follow : the generator       and therefore have a better event classification.
part of the GAN will produce time series estimations taking events
as input. The discriminator will be fed with two inputs, the actual         7   FUTURE WORK
time series and the fake time-series, which is generated by the gen-        In [3], the authors temporalize word2vec to detect the mostly dis-
erator. The objective for the generator is to be able to produce time       cussed topics during certain phases of the bitcoin time series. We
series estimations that are really close to reality, in order to fool the   would like to transpose this idea to our context, by detecting which
discriminator. The discriminator objective is to have a maximum ac-         events are activated during special phases of the commodities stock.
curacy in its task to differentiate fake and real input. Since the final    Using time stamps of the documents, the idea is to determine which
output we want is a time series estimation, our general objective is        clusters of events are activated during a certain period of time
to have a generator as optimized as possible. The discriminator is          and link it with stock variations. If using timestamps to order doc-
only used in the training loop, in order to give feedback to gener-         uments is not difficult, determining when an event is activated
ator, to train it to produce valuable output. In order to give hints        brings a lot more difficulties, such as tracking event evolution and
about the future time series variations, the generator will take as         detecting the end of an event. Another goal is to be able to directly
input the events we have previously detected, which are supposed            link time series and event, in a similar method as [25]. Finally,
to carry information that influences these variations. By training it       encoder-decoder architecture are currently revolutionising the NLP
properly, the generator will be able to extract information from the        domain. We would like to be able to better represent events, lever-
events and from the feedback of the discriminator. The feedback             aging the power of encoder-decoder architectures such as BERT
from the discriminator contains information about the time series,          [6]. Wu et al. did something similar with news representation in
which are not directly available to the generator. Indeed, the final        [32]. Indeed, transformers are able to produce quality embeddings
objective is to have a generator which is able to predict time series       for both words and sentences and have proved their quality by
variations, by only exploiting the events we detect.                        outperforming static embedding techniques. A major drawback of
   To summarize, the GAN corresponds to the event-quantifying               transformer-based methods is their computation cost. Thus, the
step, and the event-time series alignment step.                             usage of distilled models such as TinyBERT [16] could be a solution.
Event detection and time series alignment to improve stock market forecasting
REFERENCES                                                                               [21] Diogo Nolasco and Jonice Oliveira. 2019. Subevents detection through topic
 [1] John Asafu-Adjaye. 2000. The Relationship between Energy Consumption, Energy             modeling in social media posts. Future Generation Comp. Syst. 93 (2019), 290–
     Prices and Economic Growth: Time Series Evidence from Asian Developing                   303.
     Countries. Energy Economics 22 (12 2000), 615–625. https://doi.org/10.1016/         [22] Brendan O’Connor, Ramnath Balasubramanyan, Bryan Routledge, and Noah
     S0140-9883(00)00050-5                                                                    Smith. 2010. From Tweets to Polls: Linking Text Sentiment to Public Opinion
 [2] Johan Bollen, Huina Mao, and Xiao-Jun Zeng. 2010. Twitter mood predicts the              Time Series. International AAAI Conference on Weblogs and Social Media 11.
     stock market. CoRR abs/1010.3003 (2010). arXiv:1010.3003 http://arxiv.org/abs/      [23] Nuno Oliveira, Paulo Cortez, and Nelson Areal. 2016. The impact of microblogging
     1010.3003                                                                                data for stock market prediction: Using Twitter to predict returns, volatility,
 [3] Andrew Burnie and Emine Yilmaz. 2019. An Analysis of the Change in Dis-                  trading volume and survey sentiment indices. Expert Systems with Applications
     cussions on Social Media with Bitcoin Price. 889–892. https://doi.org/10.1145/           73 (12 2016). https://doi.org/10.1016/j.eswa.2016.12.036
     3331184.3331304                                                                     [24] Ping-Feng Pai and Chih-Sheng Lin. 2005. A hybrid ARIMA and support vector
 [4] Patrick Champagne. 2000. L’événement comme enjeu. (2000). https://doi.org/               machines model in stock price forecasting. Omega 33 (12 2005), 497–505. https:
     10.3406/reso.2000.2231                                                                   //doi.org/10.1016/j.omega.2004.07.024
 [5] Hailiang Chen, Prabuddha De, Yu Hu, and Byoung-Hyoun Hwang. 2013. Wisdom            [25] Filipe Rodrigues, Ioulia Markou, and Francisco Pereira. 2018. Combining time-
     of Crowds: The Value of Stock Opinions Transmitted Through Social Media.                 series and textual data for taxi demand prediction in event areas: A deep learning
     Review of Financial Studies (12 2013). https://doi.org/10.2139/ssrn.1807265              approach. Information Fusion 49 (07 2018). https://doi.org/10.1016/j.inffus.2018.
 [6] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019.                  07.007
     BERT: Pre-training of Deep Bidirectional Transformers for Language Understand-      [26] Eduardo Ruiz, Vagelis Hristidis, Carlos Castillo, Aristides Gionis, and Alejandro
     ing. In Proceedings of the 2019 Conference of the North American Chapter of              Jaimes. 2012. Correlating Financial Time Series with Micro-Blogging Activity.
     the Association for Computational Linguistics: Human Language Technologies,              WSDM 2012 - Proceedings of the 5th ACM International Conference on Web
     Volume 1 (Long and Short Papers). Association for Computational Linguistics,             Search and Data Mining, 513–522. https://doi.org/10.1145/2124295.2124358
     Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423             [27] Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. 2010. Earthquake Shakes
 [7] Claude B. Erb and Campbell R. Harvey. 2006. The Strategic and Tactical Value             Twitter Users: Real-Time Event Detection by Social Sensors. Proceedings of
     of Commodity Futures. Financial Analysts Journal 62, 2 (2006), 69–97. https:             the 19th International Conference on World Wide Web, WWW ’10, 851–860.
     //doi.org/10.2469/faj.v62.n2.4084 arXiv:https://doi.org/10.2469/faj.v62.n2.4084          https://doi.org/10.1145/1772690.1772777
 [8] Eugene F. Fama. 1965. The Behavior of Stock-Market Prices. The Journal of           [28] Ruey S. Tsay. 2005. Analysis of financial time series (2. ed. ed.). Wiley-
     Business 38, 1 (1965), 34–105. http://www.jstor.org/stable/2350752                       Interscience, Hoboken, NJ. http://gso.gbv.de/DB=2.1/CMD?ACT=SRCHA&SRT=
 [9] Xiaocheng Feng, Lifu Huang, Duyu Tang, Heng Ji, Bing Qin, and Ting Liu. 2016.            YOP&IKT=1016&TRM=ppn+483463442&sourceid=fbw_bibsonomy
     A Language-Independent Neural Network for Event Detection. In Proceedings           [29] Mascha C. van der Voort, Mark Dougherty, M.S. Dougherty, and Susan Watson.
     of the 54th Annual Meeting of the Association for Computational Linguistics              1996. Combining Kohonen maps with Arima time series models to forecast
     (Volume 2: Short Papers). Association for Computational Linguistics, Berlin,             traffic flow. Transportation research. Part C: Emerging technologies 4, 5 (1996),
     Germany, 66–71. https://doi.org/10.18653/v1/P16-2011                                     307–318. https://doi.org/10.1016/S0968-090X(97)82903-8
[10] Gary Gereffi. 1999. International trade and industrial upgrading in the apparel     [30] Baohua Wang, Hejiao Huang, and Xiaolong Wang. 2012. A novel text mining
     commodity chain. Journal of International Economics 48, 1 (June 1999), 37–70.            approach to financial time series forecasting. Neurocomputing 83 (04 2012),
     https://ideas.repec.org/a/eee/inecon/v48y1999i1p37-70.html                               136–145. https://doi.org/10.1016/j.neucom.2011.12.013
[11] Clive Granger and Paul Newbold. 1986. Forecasting Economic Time Series (2           [31] Jianshu Weng and Bu-Sung Lee. 2011. Event Detection in Twitter. https:
     ed.). Elsevier. https://EconPapers.repec.org/RePEc:eee:monogr:9780122951831              //www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/view/2767
[12] Ralph Grishman, David Westbrook, and Adam Meyers. 2005. NYU’s English               [32] Chuhan Wu, Fangzhao Wu, Mingxiao An, Yongfeng Huang, and Xing Xie.
     ACE 2005 system description. Proceedings of ACE 2005 Evaluation Workshop.                2019. Neural News Recommendation with Topic-Aware News Representation. In
     Journal on Satisfiability 51 (01 2005).                                                  Proceedings of the 57th Annual Meeting of the Association for Computational
[13] Mahmud Hasan, Mehmet A. Orgun, and Rolf Schwitter. 2019. Real-time event                 Linguistics. Association for Computational Linguistics, Florence, Italy, 1154–1159.
     detection from the Twitter data stream using the TwitterNews+ framework.                 https://doi.org/10.18653/v1/P19-1110
     Information Processing and Management 56, 3 (5 2019), 1146–1165. https://doi.
     org/10.1016/j.ipm.2018.03.001
[14] Yu Hong, Wenxuan Zhou, Jingli Zhang, Guodong Zhou, and Qiaoming Zhu.
     2018. Self-regulation: Employing a Generative Adversarial Network to Improve
     Event Detection. In Proceedings of the 56th Annual Meeting of the Association
     for Computational Linguistics (Volume 1: Long Papers). Association for Compu-
     tational Linguistics, Melbourne, Australia, 515–526. https://doi.org/10.18653/v1/
     P18-1048
[15] Wei Huang, Yoshiteru Nakamori, and Shou-Yang Wang. 2005. Forecasting Stock
     Market Movement Direction with Support Vector Machine. Comput. Oper. Res.
     32, 10 (Oct. 2005), 2513–2522. https://doi.org/10.1016/j.cor.2004.03.016
[16] Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang
     Wang, and Qun Liu. 2020. Tiny{BERT}: Distilling {BERT} for Natural Language
     Understanding. https://openreview.net/forum?id=rJx0Q6EFPB
[17] Nathan Keane, Connie Yee, and Liang Zhou. 2015. Using Topic Modeling
     and Similarity Thresholds to Detect Events. In Proceedings of the The 3rd
     Workshop on EVENTS: Definition, Detection, Coreference, and Representation.
     Association for Computational Linguistics, Denver, Colorado, 34–42. https:
     //doi.org/10.3115/v1/W15-0805
[18] Dorian Kodelja, Romaric Besançon, and Olivier Ferret. 2019. Exploiting a More
     Global Context for Event Detection Through Bootstrapping. 763–770. https:
     //doi.org/10.1007/978-3-030-15712-8_51
[19] Shulin Liu, Yang Li, Feng Zhang, Tao Yang, and Xinpeng Zhou. 2019. Event
     Detection without Triggers. In Proceedings of the 2019 Conference of the North
     American Chapter of the Association for Computational Linguistics: Human
     Language Technologies, Volume 1 (Long and Short Papers). Association for Com-
     putational Linguistics, Minneapolis, Minnesota, 735–744. https://doi.org/10.
     18653/v1/N19-1080
[20] Thien Huu Nguyen and Ralph Grishman. 2015. Event Detection and Do-
     main Adaptation with Convolutional Neural Networks. In Proceedings of the
     53rd Annual Meeting of the Association for Computational Linguistics and the
     7th International Joint Conference on Natural Language Processing (Volume 2:
     Short Papers). Association for Computational Linguistics, Beijing, China, 365–
     371. https://doi.org/10.3115/v1/P15-2060