<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mining Trends in Texts on the Web</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Olga Streibel</string-name>
          <email>streibel@inf.fu-berlin.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>. year</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>.Supervisor:  Dr.-Ing. Robert Tolksdorf</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Networked Information Systems, Free University Berlin</institution>
          ,
          <addr-line>Konigin-Luise-Str.24-26 , 14195 Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>From online news and blog articles, a human can often deduce information and knowledge needed for the prediction of market movements or sociological trends. However, this recognition and comprehension process is very complex and requires experience as well as some context knowledge about the domain in which trends are to detect. In order to support human experts in trend analysis, I propose an automatic trend mining method based on knowledge integrating learning approach.</p>
      </abstract>
      <kwd-group>
        <kwd>trend mining</kwd>
        <kwd>machine learning</kwd>
        <kwd>knowledge acquisition</kwd>
        <kwd>knowledge integration</kwd>
        <kwd>semantic learning</kwd>
        <kwd>tagging</kwd>
        <kwd>folksonomy</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        topic area that is growing in interest and utility over time" [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] whereas topic
in terms of Topic Detection and Tracking (TDT)[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] research is "de ned to be a
set of news stories that are strongly related by some seminal real world event".
All of these points of view on trend detection show the di erent dimensions of
trend analysis research. However, they have one thing in common: observing
patterns of changes that are based on certain variables (i.e., people, numbers,
words) and lead to a general change- the emerging trend- in the system which
is depending on these variables.
      </p>
      <p>
        As already de ned in my trend ontology approach[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], this research uses trend
mining as a general term describing trend detection, trend recognition and
trending analysis. It can refer either to the detection of emerging topic areas from text
analysis or to the detection of trends based on numeric data analysis as in the
case of stock values. However, this work focuses only on textual data available
on the Web, i.e. online news and blogs, and on learning this data under
inclusion of related background knowledge in order to capture and explain trends.
In general, I refer to the "emerging topic areas" (see also Section 4) while using
the term trend in texts whereas the objective of mining trend is "to provide an
alert that new developments are happening in a speci c area of interest in an
automated way"[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>Interesting approaches have been developed in the eld of trend mining on texts
(s. following Section) but they are still lacking the integration of expert
knowledge in the process of trend recognition. Such knowledge is crucial for the proper
trend mining and the lack of methods that integrate expert knowledge is a
research gap. This thesis aims at closing this gap. It deals with the trend detection
task as with a complex learning task based on learning and recognizing of
complex relations and dependencies in given domain regarding the time dimension.
I focus on the learning method able to integrate expert knowledge in order to
automatically recognize trends in text collections.</p>
      <p>
        Considering that "In general, trending analysis of textual data can be performed
in any domain that involves written records of human endeavors whether
scienti c or artistic in nature."[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] trend mining based on texts is useful for many
application domains, i.e. medical diagnosis, opinion mining, market monitoring,
stock market analysis, etc., and, regarding the increasing information availability
on the Internet with its need for intelligent data analysis, it is becoming more
and more important research topic in the recent years. Besides contribution to
the Trend Mining research, this thesis can have important impact for Machine
Learning, and also for the Semantic Web.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Main questions of the thesis</title>
      <p>
        Two main research questions are important for this thesis: 1) How to change
existing machine learning approaches for trend mining into knowledge integrating
learning approaches with regard to the development of the Semantic Web? 2)
How to acquire and formalize trend knowledge?
Main research projects in the eld of trend mining are described in Topic
Detection and Tracking (TDT) research[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and in Emergent Trend Detection (ETD)[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Regarding relevant work for this thesis, in rst I concentrate on the research done
in the eld of trend mining with a focus on the machine learning algorithms since
they seem to be crucial in the automatic trend mining. One of the researches,
EAnalyst system described in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], proved that determination and early
detection of emerging trends can be retrieved from numeric data as well as from texts.
EAnalyst has been designed and implemented as a general architecture for the
association of news stories with trends. The system collects hybrid data-
nancial time series and time-stamped news stories, redescribes time series data into
"high-level features", called trends, and aligns each trend with time-stamped
news stories. Such news stories serve as training set for learning the language
model which determines the statistics of word usage patterns in the stories. This
language model, learnt for every trend type, helps to monitor a stream of new
incoming news stories. The model processes new news stories due to the learnt
hypothesis. Authors de ne here the task of trend detection as a special case of
the Activity Monitoring as introduced by [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This research allows for the
general precondition in my thesis: it is possible to automatically recognize trends
by analyzing texts. Di erent from EAnalyst, I do not elaborate on text stream
monitoring but focus more on the recognition and comprehension process for
trend mining.
      </p>
      <p>
        Emergent Trend Detection (ETD) systems that concern with detection of trends
presented in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] have been characterized based on the following aspects: input
data and attributes, learning algorithms and visualization, that are important
for creating a trend analysis system. The most relevant comparison perspective
for our work are the learning algorithms. According to the system description in
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and regarding the prototypes [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ][
        <xref ref-type="bibr" rid="ref17">17</xref>
        ][
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], following learning algorithms have
been proven useful for trend mining:
{ combined "hypothesis testing"-based methods (Time Mines[
        <xref ref-type="bibr" rid="ref24">24</xref>
        ])
{ single-pass clustering (New Event Detection[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ])
{ sequential pattern matching and shape query processing (Patent Miner[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ][
        <xref ref-type="bibr" rid="ref1">1</xref>
        ])
{ feed-forward, backpropagation NN, c4.5 and SVM (Hierarchical Distributed
      </p>
      <p>
        Dynamic Indexing[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], Wuthrich[
        <xref ref-type="bibr" rid="ref27">27</xref>
        ])
{ k-NN classi er (Wuthrich[
        <xref ref-type="bibr" rid="ref27">27</xref>
        ])
{ regression analysis (Wuhtrich[
        <xref ref-type="bibr" rid="ref27">27</xref>
        ])
Besides, there are many research works related to trend mining, i.e, trend
detection based on a fuzzy temporal pro le model[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], modeling bursty streams using
in nite-state automaton[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], nite mixture model for tracking dynamics of topic
trends[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], and clustering approaches [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
Concerning both, the trend mining based on texts and enhanced text analysis,
there are many related projects on the Internet, scienti c and commercial, as
well as services that are to some extend relevant for this work: GoogleTrends
1, BlogPulse2, OpenCalais3 Two interesting research project GIDA (Generic
1 http://www.google.de/trends
2 http://www.blogpulse.com/
3 http://www.opencalais.com/
Information-based Decision Assistant) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and its follower, TREMA (Trend
Mining, Fusion and Analysis of multimodal Data) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], that concentrated on the
fusion of multimodal market data in order to mine trends in nancial markets
(GIDA, TREMA) and in market research (TREMA) are relevant for this thesis.
Several projects that concern themselves with lightweight ontologies and
extended vocabularies are relevant for the trend knowledge representation part of
this thesis, in particular: ConceptNet4 and OpenMind5 of MIT, MoaT6,
WordNet7, SentiWordNet8, Wortschatz Uni Leipzig9, DWDS10, SKOS11, SCOT12
Regarding relevant work outlined above and according to the two research
questions, this research focuses on the development of a semantic learning approach
for the automatic trend mining in texts on the Web. It also proposes the use of
trend ontology and elaborates on the extreme tagging approach[
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] for
knowledge acquisition in the trend mining task. However, the main goal of this work
is not to predict stock prices for the stock markets based on news analysis nor
to create an arti cial trader for market trading based on text analysis. This
research is neither about a general trend analysis system and it is not studying
the in uence of Web news on emerging trends (it doesn't take into account the
distinction into trend creator news, trend follower news and mainstream news).
General assumptions for this thesis are: context is crucial for successful trend
mining, collective associations like user tags from folksonomies enable the
creation of context knowledge, statistical learning can be enhanced with background
knowledge using knowledge representation approach from Semantic Web.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>General approach</title>
      <p>
        This thesis is anchored in Information System research and Design Science
paradigm[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ][
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is the methodology that provides the scienti c framework for
my research. Two main research issues are in focus of my thesis:
knowledgeintegrating learning approach for trend mining based on Machine Learning and
the representation of trend knowledge based on Semantic Web approach.
Concentrating on them, I create my artefact (in terms of Design Science), test and
evaluate my trend mining approach.
      </p>
      <p>
        So far, rst of all I did an extensive literature review comparing following
general aspects of related projects on trend mining: trend de nitions, general trend
analysis approaches, applied machine learning methods and document corpora.
Regarding this issues I elaborated on a general de nition of trends in text (this
4 http://conceptnet.media.mit.edu/
5 http://commons.media.mit.edu/en/
6 http://moat-project.org/
7 http://wordnet.princeton.edu/
8 http://sentiwordnet.isti.cnr.it/
9 http://wortschatz.uni-leipzig.de/
10 http://www.dwds.de/
11 http://www.w3.org/2004/02/skos/
12 http://scot-project.org/
gives the main setting for de ning the learn problem in the next steps).
Furthermore, I implemented a static storage, parsing and partially preprocessing of
document corpus that consists of about 200000 business news in German
language in the time interval 2006-2007. I also elaborated on the trend ontology
approach [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and on the knowledge acquisition approach using tag tagging[
        <xref ref-type="bibr" rid="ref25">25</xref>
        ].
In the next steps, I have to concentrate on the general description of the learning
problem in case of mining trends in texts (what kind of feedback is available,
what kind of features should be learnt, how to extract trend labels, what is the
feature space and how good separable are di erent classes, how can the features
be extended into semantic features, etc.). While de ning the learning problem,
I also have to consider the representation of the learning data and the
representation of the background knowledge.
      </p>
      <p>In general, this thesis elaborates on the idea of semantic learning which is the
adoption of inductive learning approach from the Machine Learning with the
knowledge representation approach from Semantic Web. The outcome of this
thesis is a knowledge integrating method for mining trends in texts which aims
at improving the quality of trend mining methods and brings the additional
value to the existing methods- the trend explanation.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Proposed solution</title>
      <p>At this stage of my work, the solution proposed starts with few important
definitions: time window, time slice, burstiness, interestingness, utility and trend
indication. Based on them, an exact description of what are trends in text is
possible:
De nition 1: Time window
twindow is a time interval in which trends can occur. Furthermore, it can be
described as an ordered set of subintervals.
tslice13is a subinterval of time window. If its starting point lies at t0 the end
point has to lie at tk &lt; tn</p>
      <p>twindow = [t0:::tn] ^ tslicek = [t0:::tk]
twindow := ftslicek; :::; tsliceng ^ jtslicekj = jtslicenj ^ k; n 2 N ^ k &lt; n
(1)
Time slices have the same length.</p>
      <p>
        De nition 2: Burstiness
In order to distinguish words in the documents of given time slice from the
all documents in time window, TFIDF (term frequency inverse document
frequency)[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] function is adapted. The function result for each word says how
important is a given word in a given period of time. This is the function to
discover the burstiness of words: if there is a word in a given time slice which
appears only in the documents of this time slice and not in the whole window
13 this is needed since only long-term trends are relevant for this thesis
(backwards) it could be the so called entry point of a trend.
      </p>
      <p>burst(w)twindow := T F(w;jDjtslice ) IDF(w;jDjtwindow )
(2)
IDF(w;jDjtwindow ) := log</p>
      <p>jDjtwindow
DF (w)twindow
whereas jDj is the total number of documents. If the word continues to appear
in next time slices, and becomes interesting, the word can become
trendindicating. Based on the time component as in Def. 1, trendindication is de ned by
interestingness and utility as follows:
De nition 3: Interestingness
Interestingness is de ned by the frequency of word w in the time window. This
can be expressed for a time slice by the sum of the term frequency of word w
in all the documents of given time slice divided by the number of documents in
this time slice (scaled by binary logarithm).</p>
      <p>interest(w)tslice = f (w)tslice := log</p>
      <p>P T F(w;Dtslice )
jDjtslice
For the trendindication it is important to know if the interestingness of a word
is rising over time window. As given by formula 1 in Def.1, we de ne as follows
for given time window:
interest(w)twindow := ff (w)tslice k; f (w)tslice k+1; : : : ; f (w)tslice ng
(4)
expresses increasing interestingness if14:</p>
      <p>f (w)tslice k &lt; f (w)tslice k+1 &lt; : : : &lt; f (w)tslice n
De nition 4: Utility
Utility expresses how popular do users nd a given word w in the given time
window. I propose to retrieve it by analysing collaborative tagging systems (CTB),
i.e. delicious, and estimating the popularity of given word as a tag in the same
time window as for the trend estimation. The popularity can be simple described
by the number of resources in CTB that in given time window have been tagged
with the word w divided by the number of all resources tagged in this time
window:
util(w)twindow := log jRj(tag=w)twindow
jRj(tag)twindow
De nition 5: Trend indication
trendind(w)twindow =
burst(w)tslicek + interest(w)tslice</p>
      <p>util(w)twindow
ratio(twindow)
14 this thesis focuses only on upcoming trends and ignores falling trends
(3)
(5)
(6)</p>
      <p>ratio(twindow) = jtwindowj
is the number of time slices.</p>
      <p>
        The de nitions above allow for a general description of emerging topics in given
time window: emerging topics are in the simplest case the intersection of the
trend indicating words (set of all words that at some point in the time window
start to have bursty behavior and appear frequently enough to be discovered and
rare enough to be important in given time window) with the set of words used as
tags in a CTB in this time window. Furthermore, the trend indication allows for
automatic labeling of the document corpus and dividing it in trend indicating and
neutral documents (regarding the time slices in which the documents appear).
However, this is the statistical part of the approach and it focuses only on simple
words. At this stage of the thesis tests have to be done in order to prove it
useful. Furthermore, I have to elaborate on the inclusion of the background
knowledge into the labeling either by applying my trend ontology[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] approach or
tag tagging approach[
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] in order to extend the features into the "real" semantic
concepts, which I call statements, and at the same time to reduce the dimension
of the feature space.
      </p>
      <p>As for learning approach I propose to adapt the Bayes learning15. The Bayes
theorem could be in this case explain in very general way as:</p>
      <p>P (T jS) =</p>
      <p>P (SjT )P (T )</p>
      <p>P (S)
(7)</p>
      <p>P (T jS) is the a posteriori probability of T conditioned on S whereas T is
the hypothesis and S a statement. In case of mining trends T says that there
is an indication for a trend and P (T jS) re ects the probability that the given
statement S will indicate a trend (or that the given statement S is built on
trendindicating concepts and therefore indicates a trend). P (T ) and P (S) are the
a priori probabilities: over T (any given statement causes trend) and over S (any
statement from the training set is trend-indicating), P (SjT ) can be estimated
from the given data.</p>
      <p>At this stage of my work, I start the tests for trend feature extraction and
continue to elaborate on my solution for integration of background knowledge
as well as for proper de nition of the learning method.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Evaluation</title>
      <p>The evaluation of my approach is primary based on the evaluation of the model
performance which can be conducted using crossvalidation and measured in
general by the recall and precision values. For the crossvalidation, the document
corpus is divided in i folders and the validation process is repeated i times whereas
15 However, also decision trees (good for vizualization and comprehending of the model)
and support vector machines (most reliable classi cation method) have to be
considered
in every i-step of the validation the 1i part of the document corpus is used as
a test set while the rest i i 1 stacks are used for building the learning model. If
D is the set of documents, jDj is the total number of documents in the set, the
precision/recall value are:
recall = jDjtrendindicating and retrieved</p>
      <p>jDjtrendindicating
Also, for the numeric prediction, the relative absolute error measure can be
applied:
with:
jp1
ja1
a1j + : : : + jpn
aj + : : : + jan
anj
aj
a = 1 X ai
n
i
p1; p2; : : : pn mean the predicted value for the test instances and a1; a2; : : : an the
actual values. The formulas above give only an insight into the possible measure
ways. The nal evaluation depends on the nal learning model and should also
take into account the knowledge integration part (this could be done i.e. in case
of decision trees by additional measure of changes in information gain values).
6</p>
    </sec>
    <sec id="sec-6">
      <title>Future Work</title>
      <p>Many research issues are relevant to this thesis. From the information retrieval
point of view one of them is for example the research on graph-based
representation model for documents and semantic indexing of the document collections.
In this stage of the work it is too early to expand the remaining issues.
Acknowledgments This work has been partially supported by the InnoPro
leCorporate Semantic Web project funded by the German Federal Ministry of
Education and Research (BMBF) and the BMBF Innovation Initiative for the
New German Lander - Entrepreneurial Regions. The author wants to thank Prof.
Robert Tolksdorf and Prof. Abraham Bernstein for their helpful comments on
the content of this thesis.
(8)
(9)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Rakesh</given-names>
            <surname>Agrawal</surname>
          </string-name>
          , Edward L.
          <string-name>
            <surname>Wimmers</surname>
            , and
            <given-names>Mohamed</given-names>
          </string-name>
          <string-name>
            <surname>Zait</surname>
          </string-name>
          .
          <article-title>Querying shapes of histories</article-title>
          . pages
          <volume>502</volume>
          {
          <fpage>514</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Khurshid</given-names>
            <surname>Ahmad</surname>
          </string-name>
          .
          <article-title>Events and the causes of events</article-title>
          . In Lee Gillam, editor,
          <source>Proceedings of the Workshop on Making Money in the Financial Services Industry, at the 6th International Conference on Terminology and Knowledge Engineering</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>James</given-names>
            <surname>Allan</surname>
          </string-name>
          .
          <source>Topic Detection and Tracking: Event-based Information Organization</source>
          . Kluwer Academic Publishers,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>James</given-names>
            <surname>Allan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Ron</given-names>
            <surname>Papka</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Victor</given-names>
            <surname>Lavrenko</surname>
          </string-name>
          .
          <article-title>On-line new event detection and tracking</article-title>
          .
          <source>In SIGIR '98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>37</volume>
          {
          <fpage>45</fpage>
          , New York, NY, USA,
          <year>1998</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Michael</given-names>
            <surname>Berry</surname>
          </string-name>
          .
          <article-title>Survey of Text Mining: Clustering, Classi cation</article-title>
          , and Retrieval. Springer Science+Business Media, Inc, year =
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Raymond</surname>
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Wong Desh</surname>
          </string-name>
          <article-title>Peramunetilleke</article-title>
          .
          <article-title>Currency exchange rate forecasting from news headlines</article-title>
          .
          <source>In Proceedings 13th Australasian Database Conference</source>
          , pages
          <volume>131</volume>
          {
          <fpage>139</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Tom</given-names>
            <surname>Fawcett</surname>
          </string-name>
          and
          <string-name>
            <given-names>Foster</given-names>
            <surname>Provost</surname>
          </string-name>
          .
          <article-title>Activity monitoring: Noticing interesting changes in behavior</article-title>
          .
          <source>In In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining</source>
          , pages
          <volume>53</volume>
          {
          <fpage>62</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Paulo</given-names>
            <surname>Felix</surname>
          </string-name>
          , Santiago Fraga,
          <article-title>Roque Mar n, and Senen Barro</article-title>
          .
          <article-title>Trend detection based on a fuzzy temporal pro le model</article-title>
          .
          <source>AI in Engineering</source>
          ,
          <volume>13</volume>
          (
          <issue>4</issue>
          ):
          <volume>341</volume>
          {
          <fpage>349</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>L.</given-names>
            <surname>Gillam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Casey</surname>
          </string-name>
          , D. Cheng, T. Taskaya,
          <string-name>
            <given-names>P.C.F.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P</given-names>
            <surname>Manomaisupat</surname>
          </string-name>
          .
          <article-title>Economic news and stock market correlation: A study of the uk market</article-title>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. J. Han and
          <string-name>
            <given-names>M.</given-names>
            <surname>Kamber</surname>
          </string-name>
          .
          <article-title>Data Mining Concepts and Techniques</article-title>
          . Morgan Kaufmann Publishers Inc,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>A. R. Hevner</surname>
            ,
            <given-names>S. T.</given-names>
          </string-name>
          <string-name>
            <surname>March</surname>
            , J. Park, and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Ram</surname>
          </string-name>
          .
          <article-title>Design science in information systems research</article-title>
          .
          <source>MIS Quarterly</source>
          ,
          <volume>28</volume>
          (
          <issue>1</issue>
          ):
          <volume>75</volume>
          {
          <fpage>106</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>Jon</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          .
          <article-title>Bursty and hierarchical structure in streams</article-title>
          .
          <source>In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          , pages
          <volume>91</volume>
          {
          <fpage>101</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>April</surname>
            <given-names>Kontostathis</given-names>
          </string-name>
          , Leon Galitsky, William M. Pottenger, Soma Roy, and
          <string-name>
            <given-names>Daniel J.</given-names>
            <surname>Phelps</surname>
          </string-name>
          .
          <article-title>A Survey of Emerging Trend Detection in Textual Data Mining</article-title>
          .
          <source>SpringerVerlag</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>April</surname>
            <given-names>Kontostathis</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Lars E.</given-names>
            <surname>Holzman</surname>
          </string-name>
          , and
          <string-name>
            <surname>William</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Pottenger</surname>
          </string-name>
          .
          <article-title>Use of term clusters for emerging trend detection</article-title>
          .
          <source>Technical report</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Victor</surname>
            <given-names>Lavrenko</given-names>
          </string-name>
          , Matt Schmill, Dawn Lawrie, Paul Ogilvie, David Jensen,
          <string-name>
            <given-names>and James</given-names>
            <surname>Allan</surname>
          </string-name>
          .
          <article-title>Mining of concurrent text and time series</article-title>
          .
          <source>In In proceedings of the 6 th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining Workshop on Text Mining</source>
          , pages
          <volume>37</volume>
          {
          <fpage>44</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Brian</surname>
            <given-names>Lent</given-names>
          </string-name>
          , Rakesh Agrawal, and
          <string-name>
            <given-names>Ramakrishnan</given-names>
            <surname>Srikant</surname>
          </string-name>
          .
          <article-title>Discovering trends in text databases</article-title>
          .
          <source>pages</source>
          <volume>227</volume>
          {
          <fpage>230</fpage>
          . AAAI Press,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Marc-Andre Mittermayer</surname>
            and
            <given-names>Gerhard F.</given-names>
          </string-name>
          <string-name>
            <surname>Knolmayer</surname>
          </string-name>
          .
          <article-title>Newscats: A news categorization and trading system</article-title>
          .
          <source>Data Mining</source>
          , IEEE International Conference on,
          <volume>0</volume>
          :
          <fpage>1002</fpage>
          {
          <fpage>1007</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>Satoshi</given-names>
            <surname>Morinaga</surname>
          </string-name>
          and
          <string-name>
            <given-names>Kenji</given-names>
            <surname>Yamanishi</surname>
          </string-name>
          .
          <article-title>Tracking dynamics of topic trends using a nite mixture model</article-title>
          .
          <source>In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , Seattle, Washington, USA,
          <year>August</year>
          22-
          <issue>25</issue>
          ,
          <year>2004</year>
          , pages
          <fpage>811</fpage>
          {
          <fpage>816</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>Streibel</given-names>
            <surname>Olga</surname>
          </string-name>
          .
          <article-title>Xml-clearinghouse report 17: Xml-technologies and semantic web for trend mining in business applications</article-title>
          .
          <source>Technical report</source>
          , Freie Universitt Berlin,
          <string-name>
            <surname>XML-Clearinghouse</surname>
            <given-names>Project</given-names>
          </string-name>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>William</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Pottenger</surname>
          </string-name>
          and
          <string-name>
            <surname>Ting-Hao Yang</surname>
          </string-name>
          .
          <article-title>Detecting emerging concepts in textual data mining</article-title>
          .
          <source>pages 89{105</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>Gerard</given-names>
            <surname>Salton</surname>
          </string-name>
          .
          <source>Automatic Text Processing. Addison-Wesley</source>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Herbert</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Simon</surname>
          </string-name>
          .
          <source>The sciences of the arti cial (3rd ed.)</source>
          . MIT Press, Cambridge, MA, USA,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>Olga</given-names>
            <surname>Streibel</surname>
          </string-name>
          and
          <string-name>
            <given-names>Malgorzata</given-names>
            <surname>Mochol</surname>
          </string-name>
          .
          <article-title>Trend ontology for knowledge-based trend mining in textual information</article-title>
          .
          <source>In 7th International Conference on Internet Technology: New Generations</source>
          , pages
          <volume>1285</volume>
          {
          <fpage>1288</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <given-names>Russel</given-names>
            <surname>Swan</surname>
          </string-name>
          and
          <string-name>
            <given-names>David</given-names>
            <surname>Jensen</surname>
          </string-name>
          . Timemines:
          <article-title>Constructing timelines with statistical models of word usage</article-title>
          .
          <source>In KDD-2000 Workshop on Text Mining.</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <given-names>Vlad</given-names>
            <surname>Tanasescu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Olga</given-names>
            <surname>Streibel</surname>
          </string-name>
          .
          <article-title>Extreme tagging: Emergent semantics through the tagging of tags</article-title>
          .
          <source>In ESOE</source>
          , pages
          <volume>84</volume>
          {
          <fpage>94</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <given-names>Henrik</given-names>
            <surname>Vejlgaard</surname>
          </string-name>
          .
          <article-title>Anatomy of A Trend</article-title>
          .
          <string-name>
            <surname>McGraw-Hill</surname>
          </string-name>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27. B. Wuthrich,
          <string-name>
            <given-names>D.</given-names>
            <surname>Permunetilleke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Leung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Lam</surname>
          </string-name>
          .
          <article-title>Daily prediction of major stock indices from textual www data</article-title>
          .
          <source>In proceedings of the 4th International Conference on Knowledge Discovery and Data Mining - KDD-98</source>
          , pages
          <fpage>364</fpage>
          {
          <fpage>368</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>