=Paper=
{{Paper
|id=Vol-1088/paper9
|storemode=property
|title=Trend Template: Mining Trends With a Semi-formal Trend Model
|pdfUrl=https://ceur-ws.org/Vol-1088/paper9.pdf
|volume=Vol-1088
|dblpUrl=https://dblp.org/rec/conf/ijcai/StreibelWTM13
}}
==Trend Template: Mining Trends With a Semi-formal Trend Model==
Trend template: mining trends with a semi-formal trend model
Olga Streibel, Lars Wißler, Robert Tolksdorf, Danilo Montesi
streibel@inf.fu-berlin.de, lars.wissler@googlemail.com, tolk@ag-nbi.de
Networked Information Systems Group, Freie Universität Berlin, Berlin, Germany
montesi@cs.unibo.it
University of Bologna, Bologna, Italy
Abstract which emerged in political news worldwide in the beginning
of 2011, as well as the financial and real estate crisis which
Predictions of uprising or falling trends are helpful started to emerge on business news worldwide in 2008. A
in different scenarios in which users have to deal graphical representation of a trend, based on GoogleTrends3 ,
with huge amount of information in a timely man- is shown in Fig. 1.
ner,such as during financial analysis. This tempo-
ral aspect in various cases of data analysis requires
novel data mining techniques. Assuming that a
given set of data, e.g. web news, contains informa-
tion about a potential trend, e.g. financial crisis, it
is possible to apply statistical or probabilistic meth-
ods in order to find out more information about this
trend. However, we argue that in order to under-
stand the context, the structure, and explanation of
a trend, it is necessary to take a knowledge-based
approach. In our study we define trend mining and
Figure 1: This graph shows a search volume index for the terms “fi-
propose the application of an ontology-based trend
nancial crisis” (blue curve) and “insolvent” (red curve) in Germany
model for mining trends from textual data. We in- from 2006 to 2011. Source: GoogleTrends
troduce the preliminary definition of trend mining
as well as two components of our trend model: the
Several methods have been proposed for detecting trends in
trend template and the trend ontology. Further-
texts or discovering trends in the web news (see Section 3).
more, we discuss the results of our experiments
Other works provide approaches from statistics and time se-
with trend ontology on the test corpus of German
ries analysis that can be applied for analyzing trends in non-
web news. We show that our trend mining approach
textual data. Our work contributes to the general understand-
is relevant for different scenarios in ubiquitous data
ing of trend mining that we see as highly relevant to ubiqui-
mining.
tous data mining. In this paper, we explain our abstract con-
cept of a trend template and go on to describe a trend ontology
1 Introduction which is an instance of the trend template.
When discussing trends some of us may think about the ups 2 Ubiquitous data mining and trend mining
and downs of NASDAQ1 , or DAX2 curves, or changes in pub-
lic opinion on politics before elections. Likewise, one can The Ubiquitous Data Mining (UDM) is defined as the essen-
think about web trends, life style trends or daily trends, i.e. tial part of the ubiquitous computing [Witten and Eibe, 2005].
hot topics, in the news or on social networks. Changes in a The UDM techniques help in extracting useful knowledge
mobile data stream also fall within the definition of a trend. from data that describes the world in movement, including
Understanding a trend as a hot topic is related to the research the aspects of space and time. Time is the necessary dimen-
in Emerging Topic Detection (EDT) and Topic Detection and sion for trend mining– there is no trend without time. And a
Tracking (TDT), the subfields of information retrieval [Allan, trend is one of the aspects of a world in movement. Before
2002][Kontostathis et al., 2003]. A trend is defined there as we discuss general trend characteristics, we want to mention
a topic that emerges in interest and utility over time. Accord- the sociological and statistical perspectives on the trend, as
ingly, common examples of trends may be the “Arab Spring” well as define trend mining. This helps in understanding the
trend characteristics that create the basis for the definition of
1
http://www.nasdaq.com/ online accessed 04-17-2013 our trend template later in this paper.
2
http://dax-indices.com/EN/index.aspx?pageID=4 online accessed 04-17-
3
2013 http://www.google.com/trends/ online accessed 04-17-2013
2.1 Trend from different perspectives the case for so-called short-term trends that are indeed trig-
gered by some events and in order to detect them we have
Detecting trends from the sociological point of view is an an-
to monitor the stream in which they occur, e.g. the occur-
alytical method for observing changes in peoples behavior
rence of “Eyjafjallajkull eruption”4 which was reported in
over time with regard to “six attitudes towards trends” [Ve-
social networks and on the news in March 2010. However,
jlgaard, 2008]. The definition of these six attitudes is based
so-called long-term trends, e.g. “financial crisis”, that started
on eight different personality profiles of groups who partici-
to be on-topic in 2008 are not necessarily conjoined with one
pate in the trend process: trend creators, trend setters, trend
specific event. It is more a chain of events or even the “soft”
followers, early mainstreamers, mainstreamers, late main-
indicators as public opinion or news. No sharp distinction
streamers, conservatives and anti-innovators.
has been made between the TDT and ETD research fields,
Detecting trends from the statistics perspective is based on
which means that some research such as [Swan and Allan,
trend analysis of time-series data with two goals in mind:
1999] or [Lavrenko et al., 2000] can be in fact classified into
“modeling time series (i.e. to gain insight into the mecha-
both fields. Temporal data mining research [Mitsa, 2010] of-
nisms or underlying forces that generate the time series) and
fers methods for clustering, classification, dimension reduc-
forecasting time series (i.e. to predict the future values of the
tion and processing of time-series data [Wang et al., 2005].
time-series variables)” [Han and Kamber, 2006]. The trend
It addresses in general the temporal data and the techniques
analysis process consists of four major components: trend or
of time series analysis on these data. One definition of tem-
long-term movements, cyclic movements or cyclic variations,
poral data is “time series data which consist of real valued
seasonal movements or seasonal variations, and irregular or
sampled at regular time intervals” [Mitsa, 2010]. Temporal
random movements [Han and Kamber, 2006]. A trend, in this
data mining applies the data mining methodology and deals
context, is an indicator for a change in the data mean [Mitsa,
with the same approaches for classification or clustering, that
2010].
are relevant for mining trends in textual data.
2.2 Trend mining
4 Trend template
Since data mining can be described as “the extraction of
implicit, previously unknown, and potentially useful infor- Based on our experiments and considerations, we outline the
mation from data” [Witten and Eibe, 2005], we propose the following assumptions about trends in the general context of
use of the term trend mining as defined below: this work;
A trend can be described by the following characteristics:
DEF 2.1 Trend mining is the extraction of implicit, trigger, context, amplitude, direction, time interval, and re-
previously unknown and potentially useful knowledge from lation. Fig. 2 illustrates the trend template.
time-ordered text or data. The trend mining techniques In 4.1, we more precisely define each characteristic.
can be used for capturing trend in order to support user in
providing previously unknown information and knowledge 4.1 Definitions
about the general development in users field of interests. Trigger is a thing. They can be: an event, a person, or a topic
anything that triggers the trend. A trigger can but does not
have to cause a trend. A trigger makes the trend visible. An
3 Related Research example of a trigger is Lehman Brothers5 insolvency that can
In general, when mining trends from textual data, at least the be classified as both a topic and an event.
following three research areas should be mentioned: emer- Context is the area of the trigger. If the trigger is a topic
gent trend detection, topic detection and tracking, and tem- then the context is this topic’s area, e.g. Lehman Brothers
poral data mining. insolvency is mentioned in the context of real estate market.
In [Kontostathis et al., 2003] several systems that detect Amplitude is the strength of a given trend. It can be
emerging trends in textual data are presented. These ETD expressed by a number, the higher the number the more
systems are classified into two main categories: semi- impact the trend has or by a qualitative value that describes
automatic and fully-automatic. For each system there is a the trend phase, e.g. beginning (setter), emerging (follower),
characterization based on the following aspects: input data mainstream, fading (conservative).
and attributes, learning algorithms and visualization. This Time is necessary while spotting trend, since there can be
comparison includes an overview over the research published no trend without time. It is the interval in which the trend
in [Allan et al., 1998][Lent et al., 1997][Agrawal et al., is appearing, independent from the amplitude, e.g. the real
1995][Swan and Jensen, 2000][Swan and Allan, 1999][Watts estate crisis appeared between the years 2008-2011.
et al., 1997]. TDT research [Allan, 2002] is predomi- Relation expresses the dependency between the trigger and
nantly related to the event-based approaches. Event-based the context, it puts the given trigger, e.g. Lehman Brothers
approaches for trend mining underlie the assumption that insolvency within the given context of the real estate crisis
trends are always triggered by an event, which is often de- in a relation, e.g. Lehman Brothers insolvency is part of the
fined as “something happening” or “something taking place”
4
[Lita Lundquist, 2000] in the literature. Considering a trend The eruption of an Icelandic volcano in March 2010 that caused air travel chaos in Eu-
rope and revenue lost for the airlines http://www.volcanodiscovery.com/iceland/
from the event research perspective means that trend detec- eyjafjallajoekull.html online accessed 04-17-2013
5
tion has to be understood as a monitoring task. This is mostly http://www.lehman.com/ online accessed 04-17-2013
Figure 2: Trend template– an abstract conceptualization
real estate crisis. and Rco the set of relations:
Rco := {rco0 , . . . , rcon }, n ∈ N ∧rco ∈ Rco ∧Rco ⊆ Cco × Cco
4.2 Formal description whereas rco defines a binary relation:
The trend template is an abstract model that describes rco : ccox , ccoy −→ rco (ccox , ccoy ) ∧ ccox 6= ccoy
the main concepts that are important and necessary for and the context element is defined by:
knowledge-based trend mining. In following, we more ex-
plicitly define the trend template: c = cco ∪ (ccoi , ccoj )
DEF. 4.1: Trend template (TT) is a quintuple: C = Cco ∪ Cco × Cco
T T := hT, C, R, T W, Ai
DEF. 4.4: R-Relational is a set of relations:
where: T is trigger, C is context, R is relation, T W is time
window, and A is amplitude. R := {r0 , . . . , rn }, n ∈ N ∧ r ∈ R ∧ R := {T × C}
with
DEF. 4.2: T- Trigger is set of concepts: ri : ti , ci −→ ri (ti , ci )
T := {t0 , . . . , tn }, n ∈ N ∧ t ∈ T
DEF. 4.5: TW- Time window is a function that assigns time
so that if E, P , To are the sets defining: slice to the time points:
events: E := {e0 , . . . , en }, n ∈ N ∧ e ∈ E
persons: P := {p0 , . . . , pn }, n ∈ N ∧ p ∈ P T P := {tpoint |tpoint =
locations: L := {l0 , . . . , ln }, n ∈ N ∧ l ∈ L = ms ∨ second ∨ minute ∨ hour ∨ day ∨ month ∨ year}
topics: To := {to0 , . . . , ton }, n ∈ N ∧ to ∈ To T S := htpoint0 . . . tpointn i
then:
T W : T P −→ T S
T := E ∪ P ∪ To ∪ L
DEF. 4.3: C- Context is a union set consisting of a set of DEF. 4.6: A- Amplitude is a function that assigns a value to
concepts and a set of relations between them where c is a the quadruple of hT, C, R, T W i
context element:
A : T × C × R × T W −→ N ∪ V
C := Cco ∪ Rco , c ∈ C
where N is the set of natural numbers and V is the set of
with Cco the set of concepts categorical values
Cco := {cco0 , . . . , ccon }, n ∈ N ∧ cco ∈ Cco a : (t, c, r, tw) −→ n ∨ v
5 Trend Ontology Algorithm
6.1: CREATE T REND D ESCRIPTION(c, o)
One way of implementing the trend template is the realization
of this model in the form of an ontology. We can understand
comment: parse ∀ document ∈ corpus
the ontology as an instance of the trend template.
comment: into ontology
Based on the trend template described above, we created an parse(c, inO, outO){
applicable model, using SKOS6 and RDFS/OWL7 concepts model.read(inO)
and properties. Our model serves as a general model that create.reasoner(inO)
can be extended regarding the particular application domain for each d∈ c
and applied for annotating a text corpus in order to retrieve do {
the trend structure. The trend ontology is divided into levels parse(keywords);
match.model(keywords, inO){
meta, middle and low which correspond to three abstract lay- for keyword ← 0 to i
ers of the model. Whereas the low level and the middle level if inO.concept.label==keyword or
relate to the corresponding application domain (in our case it keyword∈ inO.concept.label
is the German Stock Exchange, DAX), the meta level is the keyword.pref ix or keyword.postf ix==
most interesting one. Meta ontology incorporates the general inO.concept.label.pref ix or .postf ix
trend characteristics and can be applied to any application do- then matches.add(keyword)}
main. relate.model(matches, inO){
The central concepts of the ontology are Trigger, Trig- if model.getRelation(matches).isEmpty
gerCollection, Indication, Relational and ValuePartition then model.createRelation(matches)
and have been modeled as subconcepts of skos:Concept, else model.incCounter(matches)}}
model.write(outO)
skos:Collection and time:TemporalEntity, with different se-
mantic construction, e.g. skos:related, skos:member. The
concepts mirror the composition of the trend template. Trig-
ger consists of three subconcepts: event, person, location.
The main goal of the meta ontology is to offer all necessary In general, the content of the corpus is focused on finance
and business information concerning German companies and
concepts and relations in order to span the trend template as stocks. It focuses on the situation at DAX, as well as on re-
a structure over a text corpus. To actually translate a specific views and ratings of German companies and shares. For eval-
document corpus into such a structure, meta ontology needs uation purposes regarding usefulness and practicability, the
to be combined with a domain specific trend ontology which trend ontology has been filled with two different parts of the
defines domain specific concepts, their keywords and possi- test corpus: stock market specific documents in Part 1 and the
bly also their relations. This can either be done manually by general business news in Part 2 (subsequently first and second
extracting common terms as keywords and linking them to part). They contain over 5,000 and 16,000 documents respec-
their respective concepts, or automatically by entity recog- tively. We specified several basic questions and respective
nition. The pseudocode 6.1 describes the algorithm that we queries as relevant for trends in general and specifically for
applied to build up the trend description on the test corpus. stock market trends. Querying the ontology for the total oc-
currence of concepts yields the following output (shortened to
some of the most relevant concepts): Germany (9,137), USA
6 Experiments (4,808), Deutsche Telekom (442), Allianz (433), Switzerland
(382), Starbucks (104). The output corresponds directly to
The text corpus which we call German finance data8 that the corpus of German stock news with a clear focus on Ger-
served as our test corpus consists of about 40,500 news ar- man companies followed by the still dominant US market. A
ticles related to the fields of business and finance, provided similar query for often mentioned lines of business in the con-
as XML files. The corpus is available in German and pro- text of Germany in contrast to the USA yields a major focus
vides news articles from January 2007 to May 2008. The text on the industry for Germany. 4.5% to 7.1% of the total oc-
was parsed in cooperation with neofonie9 from the following currences of Germany appear in the context of different lines
sources: comdirect10 , derivatecheck11 , Handelsblatt12 , God- of industry. The USA is strong in the context of IT (9%) and
modeTrader13 , Yahoo14 , Financial Times Deutschland15 , and services (6.9%). Moreover, we checked so-called topic struc-
finanzen.net16 . ture by using our ontology. Here a general example for the
6
concept Germany:
http://www.w3.org/2004/02/skos/ online accessed 04-17-2013
7
http://www.w3.org/TR/owl-features/ online accessed 04-17-2013 trendonto:#Germany (9137) has Topic
8
Currently (May 2013) in the publishing process at Linguistic Data Consortium http://www. trendonto:#Financial : 1142
ldc.upenn.edu/ trendonto:#buy : 1003
9
http://www.neofonie.de, online accessed 04-25-2012 trendonto:#MachineBuildingIndustry : 650
10
http://www.comdirect.de/inf/index.html, online accessed 04-25-2012 trendonto:#Share : 606
11
http://derivatecheck.de/, online accessed 04-25-2012 trendonto:#StockPrice : 562
12
http://www.handelsblatt.com/weblogs/, online accessed 04-25-2012 trendonto:#Up : 520
13
http://www.godmode-trader.de/, online accessed 04-25-2012 trendonto:#Industry : 510
14
http://de.biz.yahoo.com/, online accessed 04-25-2012 trendonto:#Investment : 468
15
http://www.ftd.de/, online accessed 04-25-2012 trendonto:#Supplier : 422
16
http://www.finanzen.net, online accessed 04-30-2012 trendonto:#AutomobilIndustry : 414
ACM SIGIR conference on Research and development in
information retrieval, pages 37–45. ACM, 1998.
[Allan, 2002] James Allan, editor. Topic Detection and
Tracking. Event-based Information Organization. Kluwer
academic publishers, 2002.
[Han and Kamber, 2006] J. Han and M. Kamber. Data Min-
ing Concepts and Techniques. Morgan Kaufmann Publish-
ers Inc., 2006.
[Kontostathis et al., 2003] April Kontostathis, Leon Galit-
sky, William M. Pottenger, Soma Roy, and Daniel J.
Phelps. A Survey of Emerging Trend Detection in Textual
Data Mining. Springer-Verlag, 2003.
Figure 3: Performance of shares in the first corpus (5,000 docu-
ments) by ontology based ranking and comparison with share in- [Lavrenko et al., 2000] Victor Lavrenko, Matt Schmill,
dices in the time window July 2007 to July 2011. Dawn Lawrie, Paul Ogilvie, David Jensen, and James
Allan. Mining of concurrent text and time series. In
Proceedings of the 6 th ACM SIGKDD International
In Fig. 3 we show the comparison of the performance val- Conference on Knowledge Discovery and Data Mining
ues for the stock markets as ranked by ontology (test based on Workshop on Text Mining, pages 37–44, 2000.
time window: July 2007 to April 2008) and reported in real [Lent et al., 1997] Brian Lent, Rakesh Agrawal, and Ra-
(time window July 2007 to July 2011). Applying the trend makrishnan Srikant. Discovering trends in text databases.
ontology to the test set enables to find out specific informa- In Proceedings of the KDD’97, pages 227–230. AAAI
tion about the certain trend that is described in the documents Press, 1997.
of the test set. Our preliminary experiments results that we
[Lita Lundquist, 2000] Robert J. Jarvella Lita Lundquist.
partially present in this paper show that our idea of a trend
template could help in harvesting knowledge from the given Language, Text, and Knowledge. Mental Models of Expert
test data in a timely manner. Communication. De Gruyter, 2000.
[Mitsa, 2010] Theophano Mitsa, editor. Temporal Data Min-
7 Conclusions and future work ing. Chapman Hall/CRC Press, 2010.
[Swan and Allan, 1999] Russell Swan and James Allan. Ex-
This paper presents our research on knowledge-based trend tracting significant time varying features from text. In
mining, wherein the main contribution is our semi-formal CIKM’99: Proceedings of the eighth international confer-
model of a trend template. We showed that the implemen- ence on Information and knowledge management, pages
tation of the trend template in the form of a trend ontology 38–45. ACM, 1999.
allows for capturing the trend structure out of a test docu-
ment set. Our experiments confirm that a knowledge-based [Swan and Jensen, 2000] Russel Swan and David Jensen.
approach for mining trends out of data allows for extended Timemines: Constructing timelines with statistical models
trend explanations. Currently we are comparing the trend on- of word usage. In KDD-2000 Workshop on Text Mining,
tology experiment results with the results from adapted K- 2000.
Means clustering and LDA-based topic modeling algorithms [Vejlgaard, 2008] Henrik Vejlgaard. Anatomy of A Trend.
applied on our test set. McGraw-Hill, 2008.
[Wang et al., 2005] X. Wang, K. Smith, and R. Hyndman.
Acknowledgments Dimension reduction for clustering time series using
global characteristics. In Vaidy Sunderam, Geert van Al-
This work has been partially supported by the “InnoProfile- bada, Peter Sloot, and Jack Dongarra, editors, Computa-
Corporate Semantic Web” project funded by the German Fed- tional Science - ICCS 2005, volume 3516 of Lecture Notes
eral Ministry of Education and Research (BMBF) and the in Computer Science, pages 11–14. Springer Berlin / Hei-
BMBF Innovation Initiative for the New German Länder - delberg, 2005.
Entrepreneurial Regions.
[Watts et al., 1997] Robert J. Watts, Alan L. Porter, Scott
Cunningham, and Donghua Zhu. Toas intelligence min-
References ing; analysis of natural language processing and computa-
[Agrawal et al., 1995] Rakesh Agrawal, Edward L. Wim- tional linguistics. In PKDD ’97: Proceedings of the First
mers, and Mohamed Zait. Querying shapes of histories. European Symposium on Principles of Data Mining and
In Proceedings of the 21st VLDB, pages 502–514. Morgan Knowledge Discovery, pages 323–334. Springer-Verlag,
Kaufmann Publishers Inc., 1995. 1997.
[Allan et al., 1998] James Allan, Ron Papka, and Victor [Witten and Eibe, 2005] Ian. H. Witten and F. Eibe. Data
Lavrenko. On-line new event detection and tracking. In Mining Concepts and Techniques. Morgan Kaufmann
SIGIR’98: Proceedings of the 21st annual international Publishers Inc, 2005.