Forecasting out-of-the-ordinary financial events1
               Marco Brambilla2 and Davide Greco3 and Sara Marchesini2 and Luca Marconi2
     and Mirjana Mazuran2 and Martina Morlacchi Bonfanti2 and Alessandro Negrini2 and Letizia Tanca2


Abstract. Being able to understand the financial market is very             that influence and possibly shake the market: we call them events.
important for investors and, given the width and complexity of the          Some of them are more relevant because they represent considerable
topic, tools to support investor decisions are badly needed. In this pa-    changes of the financial market: we call them catastrophes, and they
per we present Mercurio, a system that supports the decision-making         coincide with extraordinary financial moves (not necessarily nega-
process of financial investors through the automatic extraction and         tive, though), e.g. merger and acquisition, or other significant moves
analysis of financial data coming from the Web. Mercurio formal-            of the company management, or stockprice variations. The occur-
izes the knowledge and reasoning of an expert in financial journal-         rence of a catastrophe is usually anticipated by “symptoms” that we
ism and uses it to identify relevant events within financial newspa-        call signals. For example, an investor might observe that often, be-
pers. Moreover, it performs automatic analysis of financial indexes to      fore a crash, a company gives an interview stating that profits are
identify relevant events related to the stock market. Then, sequential      increasing; from now on, whenever such an interview is published
pattern mining is used to predict exceptional events on the basis of        the expert will expect the related stock to fall in the stock market.
the knowledge of their past occurrences and relationships with other        Thus, an article containing an interview about increasing profit is a
events, in order to to warn investors about them.                           signal, while a stock crash is a catastrophe.
                                                                               The paper is organized as follows: Section 2 briefly describes some
                                                                            proposals with aims similar to ours, Section 3 gives the details of
1     Introduction                                                          the Mercurio system, Section 4 provides the current implementation
Financial data are daily produced and made available on the Web,            state and, finally, Section 5 draws the conclusions we have currently
therefore the possibility to process them allows us to model and study      reached and future research directions.
a world that is inherently complex due to the rules governing the fi-
nancial market and to the internal and external factors influencing
                                                                            2   Related work
it. Investors constantly read financial news and analyze financial in-
dexes, using their knowledge and experience to predict market events        Market prediction always receives high interest in the financial lit-
and make profitable investments. Our research aims at developing            erature: mostly, only numerical data are used, but some approaches
Mercurio, a decision support system to help investors during these          exploit also textual information to increase the quality of input data
activities.                                                                 and improve predictions.
    Mercurio identifies relevant financial events, understands how they        Works in [3, 4, 5, 6, 7] use Automated Text Categorization tech-
are related to each other and exploits this knowledge to predict fu-        niques to predict short-term market reactions to news. Articles are
ture happenings. It uses: (i) the knowledge of an expert in financial       categorized depending on the influence their publication has on fi-
journalism, whose deep understanding of the news does not consist           nancial indexes, and then correlated with financial trends and differ-
of sole natural language processing and (ii) financial indicators that      ent approaches use different types of classifiers. Our approach differs
provide an objective overview of the stock and, more in general, of         from these as we use expert knowledge to determine the relevance of
the companies’ performances. On one hand, a domain expert knows             articles. Among the examined works, [8] has a similar goal as Mer-
“how to” read an article and understand its meaning, especially since       curio, to find sequences of articles that anticipate a changing trend.
its literal inspection might not coincide with the real meaning of what     Once again the focus is on numerical data, while we are interested in
has happened. On the other hand, financial indicators provide an im-        predicting strategically extraordinary financial moves.
partial overview of the past and current financial situation of compa-         Existing works are primarily data driven, however some propos-
nies. Financial happenings are all about signals and indications that       als use a-priori knowledge about the application domain. Works in
companies leave behind along their life, and that the system must           [9, 10] analyze financial articles and create a handcrafted thesaurus
capture and interpret. Investment decisions are still made by human         containing words that drive the stock prices and that are later used to
investors, and Mercurio provides them with more knowledge, pos-             predict stock prices. Similarly, [11] uses a-priori domain knowledge
sibly hidden to human observers, to improve their decision-making           to predict interest rates: a cognitive map represents cause-effect rela-
process.                                                                    tionships among the events in the domain and is used as the basis to
    Among the many financial data available on the web, Mercurio            retrieve the relevant news; these are then classified as either positive
looks for those that convey “important” happenings, i.e., happenings        or negative according to the way they influence the rates. A work
1 This research is partially supported by the IBM Faculty Award “SOFIA:     similar to ours is [12], where the objective is to predict the Tokyo
    Semi-autOmatic Financial Information Analytics”                         stock exchange price using a-priori knowledge in the form of rules.
2 Politecnico di Milano                                                     Domain rules are defined eliciting non-numerical factors that influ-
3 Accento
                                                                            ence the stock price, however these rules differ from ours as they


                                                                           11
convey general knowledge about political and international events.          the timeline of a company are taken as input by the Model Predictor
On the contrary, we focus on financial and economic events typical          module that uses them to forecast the happening of a certain catas-
of a company’s life. The latter approaches differ from ours either          trophe with respect to a certain company. The output provided by the
in the way knowledge is represented or in the kind of knowledge             Model Predictor is composed by a set of alerts such as “there is a P%
adopted as background; we are currently trying to find a basis for          probability that company A will encounter catastrophe C within X
an effective comparison, since the systems are not available and thus       timeslots”.
an experimental comparison on the same corpus is for the moment                The most challenging and crucial aspect of the project is thus the
impossible.                                                                 process of event recognition and sequencing; however, as a side anal-
   To the best of our knowledge, a comprehensive system that makes          ysis, the time series generated by the Time-Sequence Generator can
use of both textual and numerical information to predict strategically      be compared with numerical data (indexes), arranged on their own
extraordinary financial moves is still missing.                             timeline, in order to understand correlations between them.


3   The Mercurio system                                                     3.1 Textual information
We envision an integrated and modular system that draws informa-            Events can be recognized inside textual information through text
tion from various sources and uses them appropriately with the final        analysis; in Mercurio we propose the use of three different ap-
aim of predicting the happening of extraordinary financial events,          proaches:
that is, catastrophes. Finance is a kind of domain in which the key to      • Semantic approach: events are recognized by means of semantic
successful data analysis is the integrated analysis of heterogeneous          rules that formalize the knowledge and experience of our domain
data, where time-dependent and highly frequent numerical data (e.g.,          expert.
price and volume) and textual data (e.g., news articles) should be          • Automatic approach: events are identified by applying clustering
considered jointly [13]. Both categories might encompass various              algorithms to financial news.
data sources that can be easily added to the system (as shown in Fig-       • Hybrid approach: a combination of the previous approaches
ure 1). Each of the textual data sources is managed by an Event Rec-          where catastrophes are recognized with semantic rules and signals
ognizer that is able to extract events from the data and feed them into       by means of clustering.
Mercurio. Events can be catastrophes (i.e. they convey considerable
changes of the financial market) or signals (i.e. symptoms anticipat-          In the semantic approach, in particular, rules define a relationship
ing a catastrophe). Event recognition strategies vary depending on          between sentence structures and corresponding events. This is one of
the type and nature of the managed data, for instance, each financial       the innovative features of Mercurio and can be further improved by
market (Italian, British, etc.) has its own language and dynamics, and      introducing different formalization strategies.
there are differences also among financial newspapers of the same              Some rules are independent of each other in the sense that they
country.                                                                    represent events that do not interact in any way. Other rules instead
                                                                            might represent events that are somehow related, e.g., one event
               Textual information                                          might be a composition of two different events. Moreover, some
                "Corriere            "Sole 24
                                                                      .     rules are related to events that involve only one company while oth-
                                                     "Radiocor"
                della Sera"          Ore"                             .     ers might represent an interaction among different financial players.
                Event                Event           Event
                                                                      .     These considerations generate a rule categorization that also intro-
                Recognizer           Recognizer      Recognizer
                                                                      .     duces the need for rule ordering. Such ordering is needed during the
               Mercurio
                                                                            phase when rules are applied to the financial news in order to ensure
                                                                            the correct event recognition.
                   Time-Sequence Generator                Index 1
                                                                               An interesting idea is to organize and formalize the semantic
                                                          Index 2
                                                                            rules into an ontology. The concepts in the ontology would represent
                   Model
                 Constructor
                                        Model
                                       Predictor
                                                        .....
                                                   Numerical information    events, and relationships among concepts would describe how events
                                                                            are related to each other and how they interact and depend on each
                                         Alerts
                                                                            other. Each concept should be related to a set of words (or sentence
                     Figure 1. Mercurio architecture                        structures): those that express the corresponding rule. These words
                                                                            could be defined ad-hoc according to the semantic rules in Mercurio,
   In Mercurio, the events extracted from the financial news are re-        but can also originate from external ontologies describing the finan-
ceived by the Time-Sequence Generator that arranges them on one             cial scenario or others. This addition helps to enrich the semantic
or more timelines depending on the use the system has to make of            formalization by taking into account both synonyms and new terms.
them. If the aim is to construct a model from them all, then the Time-         The use of an ontology would also allow us, through the use of
Sequence Generator creates a single timeline where all the received         inference, to discover novel information about the formalized data,
events are placed and provides this timeline as input to the Model          possibly stimulating the discovery of new events.
Constructor. On the other hand, if the aim is to predict the future
happenings related to specific companies, each created timeline con-
                                                                            3.2 Numerical information
tains only events related to a specific company, and inputs these data
to the Model Predictor.                                                     Time-dependent series such as financial indexes are represented as
   The Model Constructor module takes a sequence of events and              values on a timeline. Each timeslot (e.g. hour or day) is associated –
uses Sequential Pattern Mining techniques to find frequent subse-           according to the index – with a value, e.g. an opening value, price,
quences of events and thus creates a model of the data represented          closing value, average and so on. The timeline containing these val-
in terms of a set of sequential patterns. These patterns, together with     ues can be used, in addition to the timeline containing events coming


                                                                           12
from textual data, to enrich our data representation for the user. This      Two different text pre-processing strategies are adopted, one used
is possible not only by taking into consideration single values but       during the semantic event recognition and the other for the automatic
also by looking at some patterns inside the index.                        event recognition. In the first strategy we kept all special charac-
   A first technique is based on Bollinger Bands 4 that, given a numer-   ters, symbols, punctuation marks, numbers, words, company names
ical series, provide an upper and lower band such that the observed       and persons details because they are needed by the expert’s rules.
values usually oscillate within them. Whenever a value goes beyond        In the second strategy these data are not significant, sometimes even
these bands, it means that an unusual oscillation is happening. Thus,     misleading when applying clustering algorithms, thus they are elim-
a trend that goes below the lower band is an unexpected price fall        inated from the texts.
while a trend that goes above the upper band is an unexpected price
rise.
   A second technique that has been applied in the financial context
                                                                          4.2 Event recognition
is the detection of specific patterns, in terms of curve shape, inside    Events are detected through text analysis of the financial news. Mer-
financial time series (rather than single interesting points). The fi-    curio implements three event recognition approaches; all of them out-
nancial domain comprises some well known and meaningful trend             put a temporal sequence containing the recognized events.
patterns [14] such as “double top”, “spike bottom”, “wedge” and so
on.
   Another interesting approach is to approximate financial time se-      4.2.1   Semantic event recognition.
ries through the use of segments, for example by using piecewise          Mercurio uses a set of rules that formalize the recognition of rele-
segmentation [8]. In such way each segment represents a trend in the      vant events inside financial news. Rules define a relationship between
series, thus, we might have segments representing increasing, stable      some keywords, regular expressions (in general, sentence structures),
or decreasing volumes or prices.                                          and corresponding events (e.g. “take” is a keyword related to an ac-
   Yet another segmentation technique specifically adopted in the fi-     quisition event). An article that contains the expressions defined in a
nancial scenario is based on Turning Points (TP) [15]. TPs are lo-        rule is assigned a label corresponding to the event formalized by the
cal minimum and maximum points from the historical data and are           rule. Each article is assigned zero, one or more labels depending on
widely used in technical analysis for predicting the movement of a        the rules it triggers.
stock. In fact, they represent the trend of the stock change and can be      Rules capture meanings that go beyond the sole natural language
used to identify the beginning or end of a transaction period.            processing. For example, financial newspapers, usually, publish in-
                                                                          terviews when requested by a company. The question is: why would
4    Current implementation                                               a company want to be interviewed? When this breaks a trend of non-
                                                                          communication it must be a signal. Also, an article that mentions the
Currently. our system predicts catastrophes by taking into consid-        gross profit of a company is not a good sign because this indicator
eration the information coming from financial news, while the part        does not provide the amount of real revenue of the company, thus it
allowing the comparison with financial indexes is not implemented         could hide a negative trend of the company, whereas the net profit is
yet. The system comprises three main phases:                              not ambiguous, so this is a positive financial communication.
                                                                             Currently, Mercurio encompasses 30 semantic rules, 7 of which
1. Data acquisition and management: financial news are extracted          identify catastrophes while the rest formalize signals.
   from web sources, structured and stored into a relational database;
   their contents are then cleaned and pre-processed;
2. Event recognition: articles are analyzed to identify both catastro-    4.2.2   Automatic event recognition.
   phes and signals. Mercurio adopts the three different approaches
                                                                          This approach does not use any a-priori knowledge but relies on
   introduced in Section 3: (i) semantic approach, (ii) automatic ap-
                                                                          the detection of events by only applying clustering algorithms to the
   proach and (iii) hybrid approach.
                                                                          pre-processed financial news. Articles are represented in the Vector
3. Model construction: the events found in the previous step are used
                                                                          Space Model [1] where the weight of each term is the TF-IDF fre-
   in combination with sequential pattern mining to learn a model,
                                                                          quency of its occurrences in the article. Then, articles are clustered
   represented by means of temporal patterns, to predict the arrival
                                                                          using the K-means algorithm and each article is assigned one label,
   of catastrophes.
                                                                          corresponding to the cluster it belongs to.
                                                                             The process of article clustering has proven to be quite challeng-
4.1 Data acquisition and management                                       ing because at the end of the clustering phase we tried to interpret
                                                                          the results and found it impossible to distinguish between clusters
Mercurio currently monitors 250 Italian mid-cap companies and the         representing signals and those representing catastrophes. This was
information about them is gathered from important Italian financial       a big drawback from our point of view since we were not able to
and economic web sources such as “Il Sole 24 Ore”, “Radiocor”,            understand how to predict catastrophes.
“La Repubblica” and “Il Corriere della Sera”. Articles about compa-
nies are extracted directly from the newspaper websites and stored
into a MySQL database (our initial data contains about 14,000 arti-       4.2.3   Hybrid event recognition.
cles, from year 2010 to 2015) keeping only those that: (i) are part of
                                                                          To overcome the problem exposed above, we “added some seman-
financial and economic sections and (ii) refer to one of the chosen
                                                                          tics” to the automatic approach, obtaining what we called the hybrid
companies. After this phase the article texts are cleaned by tokeniza-
                                                                          one. In this approach, catastrophes are found by using the semantic
tion, stopword elimination and word stemming.
                                                                          rules that formalize catastrophic events, while the other signals are
4                 http://www.investopedia.com/terms/b/                    obtained by clustering all those articles that were not isolated by the
    bollingerbands.asp                                                    rules defining catastrophic events.


                                                                          13
4.3 Model construction                                                        set; (ii) some catastrophes have maximum precision and maximum
                                                                              recall thus they are perfectly predicted, i.e., there are only right pre-
The output of the event recognition phase is a sequence of events,            dictions and not wrong or missed ones; (iii) other catastrophes have
each associated with a timestamp that corresponds to the date and             always maximum precision because the system makes only right pre-
time of publishing of the article in which the event was found. Based         dictions about them, however (iv) some have low recall which means
on this sequence, Mercurio uses Sequential Pattern Mining to find             that many times the catastrophe happens and the system was not able
“recurring” temporal patterns in the input data which are then used           to predict it.
to predict future catastrophes.                                                  These results strongly depend on the minimum support thresh-
   This step is performed by using AIDA [2], a tool that encompasses          old: the higher the support threshold, the higher the precision and
both the model creation and prediction features. The tool is applied in       the lower the recall; conversely, the lower the support threshold, the
two phases: (i) given as input a temporal sequence of events, a spe-          lower the precision and the higher the recall. In general, we noticed
cific event e from the sequence and a minimum support threshold,              that both approaches offer satisfactory performances, however we are
it finds all temporal patterns that end with e and whose support is           working at making the models more accurate, so that the final proto-
above the threshold; (ii) given the found model and a real-time flow          type will be based on more training data and on an integration of the
of previously unseen articles, it predicts the happening of the learned       two techniques.
events within a certain time span. In particular, during the prediction
phase, each incoming new article is processed and labeled according
to the events it triggers. Then, the system tries to match each event         5    Conclusion
to the ones in the patterns of the model. If this happens, it waits for       In this paper we discussed Mercurio, a system that supports the
another event that would match the next event in the pattern. This            decision-making process of investors through the automatic extrac-
process is repeated until a pattern expires because of time constraints       tion and analysis of financial data, with the aim of predicting extraor-
or its last but one event is reached. When this happens, we can pre-          dinary financial moves. Current results are encouraging but leave
dict the happening of the next event, which is the one corresponding          space for many improvements, especially related to enrichments of
to the last node of the pattern, which, by construction, is always a          the current model, such as introducing weights and polarity to each
catastrophe.                                                                  event and the use of statistical information about the whole financial
                                                                              market, its different sectors and each monitored company.
4.4 Experiments
                                                                              REFERENCES
Let us briefly discuss on the performance of our prototype and com-
                                                                              [1] G. Salton, A. Wong, C. S. Yang. A vector space model for automatic
pare the semantic approach (SA) and hybrid approach (HA). First of                indexing. Commun. ACM 18, 11 (November 1975), 613-620.
all, let us recall the differences between the two approaches, in terms       [2] M. Mazuran, M. Simoni, L. Tanca. AIDA: Automatic Indexing based on
of article-event relationships: (i) in SA an article might contain both           DAta mining. SEBD 2015. pp.176-183.
catastrophes and signals, while in HA this is not possible because            [3] S. Bacher. Mining Unstructured Financial News to Forecast Intraday
                                                                                  Stock Price Movements. PhD Thesis. University Mannheim. 2012.
clustering is computed only on those articles that do not trigger any
                                                                              [4] G.P.C. Fung, J. Xu Yu, W. Lam. News Sensitive Stock Trend Prediction.
catastrophe; (ii) in SA an article might not trigger any rules thus not           PAKDD 2002 pp.481-493.
generate any event; in HA all the articles are associated with exactly        [5] G. Gidofalvi. Using News Articles to Predict Stock Price Movements.
one event, either a catastrophe or a cluster label; (iii) in SA an article        Department of Computer Science and Engineering, University of Cali-
might trigger more than one signal, while in HA each article belongs              fornia, San Diego. 2001.
                                                                              [6] M.A. Mittermayer. Forecasting Intraday Stock Price Trends with Text
to only one cluster, thus, it is related to only one signal. These differ-        Mining Techniques. HICSS 2004.
ences make it difficult to qualitatively compare the results of the two       [7] D. Peramunetilleke, R.K. Wong. Currency exchange rate forecasting
approaches, articles that trigger the same events in SA often belong              from news headlines. ADC 2002. pp.131-139.
to different clusters in HA.                                                  [8] V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, J. Allan. Lan-
                                                                                  guage models for financial news recommendation. CIKM 2000. pp.389-
   In the semantic approach we considered 2549 instances of events
                                                                                  396.
(556 of catastrophes, 1993 of signals) and, for each catastrophe, built       [9] M.A. Mittermayer, G. F. Knolmayer. NewsCATS: A News Categoriza-
a model to predict it. The constructed models contain an average of 9             tion and Trading System. ICDM 2006. pp.1002-1007.
patterns whose lengths vary between 2 and 7. In the hybrid approach          [10] B. Wuthrich, V. Cho, S. W. Leung, D. Permunetilleke, K. Sankaran, J.
we consider 3283 articles (438 catastrophes, 2845 are clustered). The             Zhang. Daily stock market forecast from textual web data. ICSMC 1998.
                                                                                  pp.2720-2725.
constructed models contain an average of 13 patterns whose lengths           [11] T. Hong, I. Han, Knowledge-based data mining of news information on
vary between 2 and 6. The hybrid approach allows us to obtain a                   the Internet using cognitive maps and neural networks. Expert Systems
greater number of patterns w.r.t. the semantic approach and results in            with Applications 2002, 23(1):1-8.
an increase of the average number of patterns for each catastrophe.          [12] K. Kohara, T. Ishikawa, Y. Fukuhara, Y. Nakamura. Stock Price Predic-
                                                                                  tion Using Prior Knowledge and Neural Networks. Intelligent Systems
All the constructed models were tested on previously unseen data to
                                                                                  in Accounting, Finance and Management 1997, 6(1):11-22.
determine the precision and recall of the predictions. We recall that        [13] F. Wanner, T. Shreck, W. Jentner, L. Sharalieva, D. A. Keim, Relating
low precision means that there are many wrong predictions, i.e. many              Interesting Quantitative Time Series Pattern with Text Events and Text
times the system predicts a catastrophe which does not actually hap-              Features. SPIE 2013.
pen, and a low recall means that there are many missed predictions,          [14] T. Fu, F. Chung, V. Ng, R. Luk. Evolutionary Segmentation of finalcial
                                                                                  time series into subsequences. Evolutionary Computation 2001.
i.e. many times the system does not predict a catastrophe and the            [15] J. Yin, Y. Si, Z. Gong. Financial Time Series Segmentation Based On
catastrophe actually happens.                                                     Turning Points. ICSSE 2011.
   The results obtained by applying the two methods vary depending
on the catastrophe: (i) some catastrophes cannot be predicted because
their model has only one pattern which does not appear in the testing


                                                                             14
15