Trend template: mining trends with a semi-formal trend model
                               Olga Streibel, Lars Wißler, Robert Tolksdorf, Danilo Montesi
                           streibel@inf.fu-berlin.de, lars.wissler@googlemail.com, tolk@ag-nbi.de
                       Networked Information Systems Group, Freie Universität Berlin, Berlin, Germany
                                                      montesi@cs.unibo.it
                                            University of Bologna, Bologna, Italy

                                      Abstract                                    which emerged in political news worldwide in the beginning
                                                                                  of 2011, as well as the financial and real estate crisis which
            Predictions of uprising or falling trends are helpful                 started to emerge on business news worldwide in 2008. A
            in different scenarios in which users have to deal                    graphical representation of a trend, based on GoogleTrends3 ,
            with huge amount of information in a timely man-                      is shown in Fig. 1.
            ner,such as during financial analysis. This tempo-
            ral aspect in various cases of data analysis requires
            novel data mining techniques. Assuming that a
            given set of data, e.g. web news, contains informa-
            tion about a potential trend, e.g. financial crisis, it
            is possible to apply statistical or probabilistic meth-
            ods in order to find out more information about this
            trend. However, we argue that in order to under-
            stand the context, the structure, and explanation of
            a trend, it is necessary to take a knowledge-based
            approach. In our study we define trend mining and
                                                                                  Figure 1: This graph shows a search volume index for the terms “fi-
            propose the application of an ontology-based trend
                                                                                  nancial crisis” (blue curve) and “insolvent” (red curve) in Germany
            model for mining trends from textual data. We in-                     from 2006 to 2011. Source: GoogleTrends
            troduce the preliminary definition of trend mining
            as well as two components of our trend model: the
                                                                                  Several methods have been proposed for detecting trends in
            trend template and the trend ontology. Further-
                                                                                  texts or discovering trends in the web news (see Section 3).
            more, we discuss the results of our experiments
                                                                                  Other works provide approaches from statistics and time se-
            with trend ontology on the test corpus of German
                                                                                  ries analysis that can be applied for analyzing trends in non-
            web news. We show that our trend mining approach
                                                                                  textual data. Our work contributes to the general understand-
            is relevant for different scenarios in ubiquitous data
                                                                                  ing of trend mining that we see as highly relevant to ubiqui-
            mining.
                                                                                  tous data mining. In this paper, we explain our abstract con-
                                                                                  cept of a trend template and go on to describe a trend ontology
1          Introduction                                                           which is an instance of the trend template.
When discussing trends some of us may think about the ups                         2       Ubiquitous data mining and trend mining
and downs of NASDAQ1 , or DAX2 curves, or changes in pub-
lic opinion on politics before elections. Likewise, one can                       The Ubiquitous Data Mining (UDM) is defined as the essen-
think about web trends, life style trends or daily trends, i.e.                   tial part of the ubiquitous computing [Witten and Eibe, 2005].
hot topics, in the news or on social networks. Changes in a                       The UDM techniques help in extracting useful knowledge
mobile data stream also fall within the definition of a trend.                    from data that describes the world in movement, including
Understanding a trend as a hot topic is related to the research                   the aspects of space and time. Time is the necessary dimen-
in Emerging Topic Detection (EDT) and Topic Detection and                         sion for trend mining– there is no trend without time. And a
Tracking (TDT), the subfields of information retrieval [Allan,                    trend is one of the aspects of a world in movement. Before
2002][Kontostathis et al., 2003]. A trend is defined there as                     we discuss general trend characteristics, we want to mention
a topic that emerges in interest and utility over time. Accord-                   the sociological and statistical perspectives on the trend, as
ingly, common examples of trends may be the “Arab Spring”                         well as define trend mining. This helps in understanding the
                                                                                  trend characteristics that create the basis for the definition of
       1
           http://www.nasdaq.com/ online accessed 04-17-2013                      our trend template later in this paper.
       2
           http://dax-indices.com/EN/index.aspx?pageID=4 online accessed 04-17-
                                                                                      3
2013                                                                                      http://www.google.com/trends/ online accessed 04-17-2013
2.1    Trend from different perspectives                            the case for so-called short-term trends that are indeed trig-
                                                                    gered by some events and in order to detect them we have
Detecting trends from the sociological point of view is an an-
                                                                    to monitor the stream in which they occur, e.g. the occur-
alytical method for observing changes in peoples behavior
                                                                    rence of “Eyjafjallajkull eruption”4 which was reported in
over time with regard to “six attitudes towards trends” [Ve-
                                                                    social networks and on the news in March 2010. However,
jlgaard, 2008]. The definition of these six attitudes is based
                                                                    so-called long-term trends, e.g. “financial crisis”, that started
on eight different personality profiles of groups who partici-
                                                                    to be on-topic in 2008 are not necessarily conjoined with one
pate in the trend process: trend creators, trend setters, trend
                                                                    specific event. It is more a chain of events or even the “soft”
followers, early mainstreamers, mainstreamers, late main-
                                                                    indicators as public opinion or news. No sharp distinction
streamers, conservatives and anti-innovators.
                                                                    has been made between the TDT and ETD research fields,
Detecting trends from the statistics perspective is based on
                                                                    which means that some research such as [Swan and Allan,
trend analysis of time-series data with two goals in mind:
                                                                    1999] or [Lavrenko et al., 2000] can be in fact classified into
“modeling time series (i.e. to gain insight into the mecha-
                                                                    both fields. Temporal data mining research [Mitsa, 2010] of-
nisms or underlying forces that generate the time series) and
                                                                    fers methods for clustering, classification, dimension reduc-
forecasting time series (i.e. to predict the future values of the
                                                                    tion and processing of time-series data [Wang et al., 2005].
time-series variables)” [Han and Kamber, 2006]. The trend
                                                                    It addresses in general the temporal data and the techniques
analysis process consists of four major components: trend or
                                                                    of time series analysis on these data. One definition of tem-
long-term movements, cyclic movements or cyclic variations,
                                                                    poral data is “time series data which consist of real valued
seasonal movements or seasonal variations, and irregular or
                                                                    sampled at regular time intervals” [Mitsa, 2010]. Temporal
random movements [Han and Kamber, 2006]. A trend, in this
                                                                    data mining applies the data mining methodology and deals
context, is an indicator for a change in the data mean [Mitsa,
                                                                    with the same approaches for classification or clustering, that
2010].
                                                                    are relevant for mining trends in textual data.
2.2    Trend mining
                                                                    4     Trend template
Since data mining can be described as “the extraction of
implicit, previously unknown, and potentially useful infor-         Based on our experiments and considerations, we outline the
mation from data” [Witten and Eibe, 2005], we propose the           following assumptions about trends in the general context of
use of the term trend mining as defined below:                      this work;
                                                                    A trend can be described by the following characteristics:
DEF 2.1 Trend mining is the extraction of implicit,                 trigger, context, amplitude, direction, time interval, and re-
previously unknown and potentially useful knowledge from            lation. Fig. 2 illustrates the trend template.
time-ordered text or data. The trend mining techniques              In 4.1, we more precisely define each characteristic.
can be used for capturing trend in order to support user in
providing previously unknown information and knowledge              4.1      Definitions
about the general development in users field of interests.          Trigger is a thing. They can be: an event, a person, or a topic
                                                                    anything that triggers the trend. A trigger can but does not
                                                                    have to cause a trend. A trigger makes the trend visible. An
3     Related Research                                              example of a trigger is Lehman Brothers5 insolvency that can
In general, when mining trends from textual data, at least the      be classified as both a topic and an event.
following three research areas should be mentioned: emer-           Context is the area of the trigger. If the trigger is a topic
gent trend detection, topic detection and tracking, and tem-        then the context is this topic’s area, e.g. Lehman Brothers
poral data mining.                                                  insolvency is mentioned in the context of real estate market.
In [Kontostathis et al., 2003] several systems that detect          Amplitude is the strength of a given trend. It can be
emerging trends in textual data are presented. These ETD            expressed by a number, the higher the number the more
systems are classified into two main categories: semi-              impact the trend has or by a qualitative value that describes
automatic and fully-automatic. For each system there is a           the trend phase, e.g. beginning (setter), emerging (follower),
characterization based on the following aspects: input data         mainstream, fading (conservative).
and attributes, learning algorithms and visualization. This         Time is necessary while spotting trend, since there can be
comparison includes an overview over the research published         no trend without time. It is the interval in which the trend
in [Allan et al., 1998][Lent et al., 1997][Agrawal et al.,          is appearing, independent from the amplitude, e.g. the real
1995][Swan and Jensen, 2000][Swan and Allan, 1999][Watts            estate crisis appeared between the years 2008-2011.
et al., 1997]. TDT research [Allan, 2002] is predomi-               Relation expresses the dependency between the trigger and
nantly related to the event-based approaches. Event-based           the context, it puts the given trigger, e.g. Lehman Brothers
approaches for trend mining underlie the assumption that            insolvency within the given context of the real estate crisis
trends are always triggered by an event, which is often de-         in a relation, e.g. Lehman Brothers insolvency is part of the
fined as “something happening” or “something taking place”
                                                                         4
[Lita Lundquist, 2000] in the literature. Considering a trend              The eruption of an Icelandic volcano in March 2010 that caused air travel chaos in Eu-
                                                                    rope and revenue lost for the airlines http://www.volcanodiscovery.com/iceland/
from the event research perspective means that trend detec-         eyjafjallajoekull.html online accessed 04-17-2013
                                                                         5
tion has to be understood as a monitoring task. This is mostly             http://www.lehman.com/ online accessed 04-17-2013
                                         Figure 2: Trend template– an abstract conceptualization


real estate crisis.                                                   and Rco the set of relations:
                                                                      Rco := {rco0 , . . . , rcon }, n ∈ N ∧rco ∈ Rco ∧Rco ⊆ Cco × Cco
4.2   Formal description                                              whereas rco defines a binary relation:
The trend template is an abstract model that describes                       rco : ccox , ccoy −→ rco (ccox , ccoy ) ∧ ccox 6= ccoy
the main concepts that are important and necessary for                and the context element is defined by:
knowledge-based trend mining. In following, we more ex-
plicitly define the trend template:                                                         c = cco ∪ (ccoi , ccoj )
DEF. 4.1: Trend template (TT) is a quintuple:                                               C = Cco ∪ Cco × Cco
                      T T := hT, C, R, T W, Ai
                                                                      DEF. 4.4: R-Relational is a set of relations:
where: T is trigger, C is context, R is relation, T W is time
window, and A is amplitude.                                               R := {r0 , . . . , rn }, n ∈ N ∧ r ∈ R ∧ R := {T × C}
                                                                      with
DEF. 4.2: T- Trigger is set of concepts:                                                   ri : ti , ci −→ ri (ti , ci )
              T := {t0 , . . . , tn }, n ∈ N ∧ t ∈ T
                                                                      DEF. 4.5: TW- Time window is a function that assigns time
so that if E, P , To are the sets defining:                           slice to the time points:
events: E := {e0 , . . . , en }, n ∈ N ∧ e ∈ E
persons: P := {p0 , . . . , pn }, n ∈ N ∧ p ∈ P                                        T P := {tpoint |tpoint =
locations: L := {l0 , . . . , ln }, n ∈ N ∧ l ∈ L                      = ms ∨ second ∨ minute ∨ hour ∨ day ∨ month ∨ year}
topics: To := {to0 , . . . , ton }, n ∈ N ∧ to ∈ To                                  T S := htpoint0 . . . tpointn i
then:
                                                                                         T W : T P −→ T S
                     T := E ∪ P ∪ To ∪ L
DEF. 4.3: C- Context is a union set consisting of a set of            DEF. 4.6: A- Amplitude is a function that assigns a value to
concepts and a set of relations between them where c is a             the quadruple of hT, C, R, T W i
context element:
                                                                                   A : T × C × R × T W −→ N ∪ V
                      C := Cco ∪ Rco , c ∈ C
                                                                      where N is the set of natural numbers and V is the set of
with Cco the set of concepts                                          categorical values
        Cco := {cco0 , . . . , ccon }, n ∈ N ∧ cco ∈ Cco                                  a : (t, c, r, tw) −→ n ∨ v
5       Trend Ontology                                                                           Algorithm
                                                                                                 6.1: CREATE T REND D ESCRIPTION(c, o)
One way of implementing the trend template is the realization
of this model in the form of an ontology. We can understand
                                                                                                  comment: parse ∀ document ∈ corpus
the ontology as an instance of the trend template.
                                                                                                  comment: into ontology
   Based on the trend template described above, we created an                                     parse(c, inO, outO){
applicable model, using SKOS6 and RDFS/OWL7 concepts                                              model.read(inO)
and properties. Our model serves as a general model that                                          create.reasoner(inO)
can be extended regarding the particular application domain                                       for each d∈ c
and applied for annotating a text corpus in order to retrieve                                       do {
the trend structure. The trend ontology is divided into levels                                    parse(keywords);
                                                                                                  match.model(keywords, inO){
meta, middle and low which correspond to three abstract lay-                                      for keyword ← 0 to i
ers of the model. Whereas the low level and the middle level                                      if inO.concept.label==keyword or
relate to the corresponding application domain (in our case it                                    keyword∈ inO.concept.label
is the German Stock Exchange, DAX), the meta level is the                                         keyword.pref ix or keyword.postf ix==
most interesting one. Meta ontology incorporates the general                                      inO.concept.label.pref ix or .postf ix
trend characteristics and can be applied to any application do-                                     then matches.add(keyword)}
main.                                                                                             relate.model(matches, inO){
The central concepts of the ontology are Trigger, Trig-                                           if model.getRelation(matches).isEmpty
gerCollection, Indication, Relational and ValuePartition                                            then model.createRelation(matches)
and have been modeled as subconcepts of skos:Concept,                                               else model.incCounter(matches)}}
                                                                                                  model.write(outO)
skos:Collection and time:TemporalEntity, with different se-
mantic construction, e.g. skos:related, skos:member. The
concepts mirror the composition of the trend template. Trig-
ger consists of three subconcepts: event, person, location.
The main goal of the meta ontology is to offer all necessary                                     In general, the content of the corpus is focused on finance
                                                                                                 and business information concerning German companies and
concepts and relations in order to span the trend template as                                    stocks. It focuses on the situation at DAX, as well as on re-
a structure over a text corpus. To actually translate a specific                                 views and ratings of German companies and shares. For eval-
document corpus into such a structure, meta ontology needs                                       uation purposes regarding usefulness and practicability, the
to be combined with a domain specific trend ontology which                                       trend ontology has been filled with two different parts of the
defines domain specific concepts, their keywords and possi-                                      test corpus: stock market specific documents in Part 1 and the
bly also their relations. This can either be done manually by                                    general business news in Part 2 (subsequently first and second
extracting common terms as keywords and linking them to                                          part). They contain over 5,000 and 16,000 documents respec-
their respective concepts, or automatically by entity recog-                                     tively. We specified several basic questions and respective
nition. The pseudocode 6.1 describes the algorithm that we                                       queries as relevant for trends in general and specifically for
applied to build up the trend description on the test corpus.                                    stock market trends. Querying the ontology for the total oc-
                                                                                                 currence of concepts yields the following output (shortened to
                                                                                                 some of the most relevant concepts): Germany (9,137), USA
6       Experiments                                                                              (4,808), Deutsche Telekom (442), Allianz (433), Switzerland
                                                                                                 (382), Starbucks (104). The output corresponds directly to
The text corpus which we call German finance data8 that                                          the corpus of German stock news with a clear focus on Ger-
served as our test corpus consists of about 40,500 news ar-                                      man companies followed by the still dominant US market. A
ticles related to the fields of business and finance, provided                                   similar query for often mentioned lines of business in the con-
as XML files. The corpus is available in German and pro-                                         text of Germany in contrast to the USA yields a major focus
vides news articles from January 2007 to May 2008. The text                                      on the industry for Germany. 4.5% to 7.1% of the total oc-
was parsed in cooperation with neofonie9 from the following                                      currences of Germany appear in the context of different lines
sources: comdirect10 , derivatecheck11 , Handelsblatt12 , God-                                   of industry. The USA is strong in the context of IT (9%) and
modeTrader13 , Yahoo14 , Financial Times Deutschland15 , and                                     services (6.9%). Moreover, we checked so-called topic struc-
finanzen.net16 .                                                                                 ture by using our ontology. Here a general example for the
    6
                                                                                                 concept Germany:
        http://www.w3.org/2004/02/skos/ online accessed 04-17-2013
    7
        http://www.w3.org/TR/owl-features/ online accessed 04-17-2013                            trendonto:#Germany (9137) has Topic
    8
      Currently (May 2013) in the publishing process at Linguistic Data Consortium http://www.   trendonto:#Financial : 1142
ldc.upenn.edu/                                                                                   trendonto:#buy : 1003
    9
      http://www.neofonie.de, online accessed 04-25-2012                                         trendonto:#MachineBuildingIndustry : 650
   10
      http://www.comdirect.de/inf/index.html, online accessed 04-25-2012                         trendonto:#Share : 606
   11
      http://derivatecheck.de/, online accessed 04-25-2012                                       trendonto:#StockPrice : 562
   12
      http://www.handelsblatt.com/weblogs/, online accessed 04-25-2012                           trendonto:#Up : 520
   13
      http://www.godmode-trader.de/, online accessed 04-25-2012                                  trendonto:#Industry : 510
   14
      http://de.biz.yahoo.com/, online accessed 04-25-2012                                       trendonto:#Investment : 468
   15
      http://www.ftd.de/, online accessed 04-25-2012                                             trendonto:#Supplier : 422
   16
      http://www.finanzen.net, online accessed 04-30-2012                                        trendonto:#AutomobilIndustry : 414
                                                                      ACM SIGIR conference on Research and development in
                                                                      information retrieval, pages 37–45. ACM, 1998.
                                                                   [Allan, 2002] James Allan, editor. Topic Detection and
                                                                      Tracking. Event-based Information Organization. Kluwer
                                                                      academic publishers, 2002.
                                                                   [Han and Kamber, 2006] J. Han and M. Kamber. Data Min-
                                                                      ing Concepts and Techniques. Morgan Kaufmann Publish-
                                                                      ers Inc., 2006.
                                                                   [Kontostathis et al., 2003] April Kontostathis, Leon Galit-
                                                                      sky, William M. Pottenger, Soma Roy, and Daniel J.
                                                                      Phelps. A Survey of Emerging Trend Detection in Textual
                                                                      Data Mining. Springer-Verlag, 2003.
Figure 3: Performance of shares in the first corpus (5,000 docu-
ments) by ontology based ranking and comparison with share in-     [Lavrenko et al., 2000] Victor Lavrenko, Matt Schmill,
dices in the time window July 2007 to July 2011.                      Dawn Lawrie, Paul Ogilvie, David Jensen, and James
                                                                      Allan. Mining of concurrent text and time series. In
                                                                      Proceedings of the 6 th ACM SIGKDD International
   In Fig. 3 we show the comparison of the performance val-           Conference on Knowledge Discovery and Data Mining
ues for the stock markets as ranked by ontology (test based on        Workshop on Text Mining, pages 37–44, 2000.
time window: July 2007 to April 2008) and reported in real         [Lent et al., 1997] Brian Lent, Rakesh Agrawal, and Ra-
(time window July 2007 to July 2011). Applying the trend              makrishnan Srikant. Discovering trends in text databases.
ontology to the test set enables to find out specific informa-        In Proceedings of the KDD’97, pages 227–230. AAAI
tion about the certain trend that is described in the documents       Press, 1997.
of the test set. Our preliminary experiments results that we
                                                                   [Lita Lundquist, 2000] Robert J. Jarvella Lita Lundquist.
partially present in this paper show that our idea of a trend
template could help in harvesting knowledge from the given            Language, Text, and Knowledge. Mental Models of Expert
test data in a timely manner.                                         Communication. De Gruyter, 2000.
                                                                   [Mitsa, 2010] Theophano Mitsa, editor. Temporal Data Min-
7   Conclusions and future work                                       ing. Chapman Hall/CRC Press, 2010.
                                                                   [Swan and Allan, 1999] Russell Swan and James Allan. Ex-
This paper presents our research on knowledge-based trend             tracting significant time varying features from text. In
mining, wherein the main contribution is our semi-formal              CIKM’99: Proceedings of the eighth international confer-
model of a trend template. We showed that the implemen-               ence on Information and knowledge management, pages
tation of the trend template in the form of a trend ontology          38–45. ACM, 1999.
allows for capturing the trend structure out of a test docu-
ment set. Our experiments confirm that a knowledge-based           [Swan and Jensen, 2000] Russel Swan and David Jensen.
approach for mining trends out of data allows for extended            Timemines: Constructing timelines with statistical models
trend explanations. Currently we are comparing the trend on-          of word usage. In KDD-2000 Workshop on Text Mining,
tology experiment results with the results from adapted K-            2000.
Means clustering and LDA-based topic modeling algorithms           [Vejlgaard, 2008] Henrik Vejlgaard. Anatomy of A Trend.
applied on our test set.                                              McGraw-Hill, 2008.
                                                                   [Wang et al., 2005] X. Wang, K. Smith, and R. Hyndman.
Acknowledgments                                                       Dimension reduction for clustering time series using
                                                                      global characteristics. In Vaidy Sunderam, Geert van Al-
This work has been partially supported by the “InnoProfile-           bada, Peter Sloot, and Jack Dongarra, editors, Computa-
Corporate Semantic Web” project funded by the German Fed-             tional Science - ICCS 2005, volume 3516 of Lecture Notes
eral Ministry of Education and Research (BMBF) and the                in Computer Science, pages 11–14. Springer Berlin / Hei-
BMBF Innovation Initiative for the New German Länder -               delberg, 2005.
Entrepreneurial Regions.
                                                                   [Watts et al., 1997] Robert J. Watts, Alan L. Porter, Scott
                                                                      Cunningham, and Donghua Zhu. Toas intelligence min-
References                                                            ing; analysis of natural language processing and computa-
[Agrawal et al., 1995] Rakesh Agrawal, Edward L. Wim-                 tional linguistics. In PKDD ’97: Proceedings of the First
  mers, and Mohamed Zait. Querying shapes of histories.               European Symposium on Principles of Data Mining and
  In Proceedings of the 21st VLDB, pages 502–514. Morgan              Knowledge Discovery, pages 323–334. Springer-Verlag,
  Kaufmann Publishers Inc., 1995.                                     1997.
[Allan et al., 1998] James Allan, Ron Papka, and Victor            [Witten and Eibe, 2005] Ian. H. Witten and F. Eibe. Data
  Lavrenko. On-line new event detection and tracking. In              Mining Concepts and Techniques. Morgan Kaufmann
  SIGIR’98: Proceedings of the 21st annual international              Publishers Inc, 2005.