<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Mining Trends in Texts on the Web</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Olga</forename><surname>Streibel</surname></persName>
							<email>streibel@inf.fu-berlin.de</email>
							<affiliation key="aff0">
								<orgName type="department">Networked Information Systems</orgName>
								<orgName type="institution">Free University Berlin</orgName>
								<address>
									<addrLine>Königin-Luise-Str.24-26</addrLine>
									<postCode>14195</postCode>
									<settlement>Berlin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><surname>Year Phd</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Networked Information Systems</orgName>
								<orgName type="institution">Free University Berlin</orgName>
								<address>
									<addrLine>Königin-Luise-Str.24-26</addrLine>
									<postCode>14195</postCode>
									<settlement>Berlin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><roleName>Prof</roleName><surname>Supervisor</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Networked Information Systems</orgName>
								<orgName type="institution">Free University Berlin</orgName>
								<address>
									<addrLine>Königin-Luise-Str.24-26</addrLine>
									<postCode>14195</postCode>
									<settlement>Berlin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><roleName>Dr</roleName><forename type="first">-Ing</forename><forename type="middle">Robert</forename><surname>Tolksdorf</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Networked Information Systems</orgName>
								<orgName type="institution">Free University Berlin</orgName>
								<address>
									<addrLine>Königin-Luise-Str.24-26</addrLine>
									<postCode>14195</postCode>
									<settlement>Berlin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Mining Trends in Texts on the Web</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">39A71B87B6E800E68EBADAB1C5082E2F</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T23:44+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>trend mining</term>
					<term>machine learning</term>
					<term>knowledge acquisition</term>
					<term>knowledge integration</term>
					<term>semantic learning</term>
					<term>tagging</term>
					<term>folksonomy</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>From online news and blog articles, a human can often deduce information and knowledge needed for the prediction of market movements or sociological trends. However, this recognition and comprehension process is very complex and requires experience as well as some context knowledge about the domain in which trends are to detect. In order to support human experts in trend analysis, I propose an automatic trend mining method based on knowledge integrating learning approach.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>1 Problem statement "Many people have been led to believe that trends are about intuition. This is because the majority of the people who work with trends find it difficult to explain why something will happen the way they say it will. The explanation often boils down to "because I think so." Some people do seem to be able to predict what will happen based on their own intuition. Unfortunately, there are too many cases in which people's intuition has obviously been mistaken (...)" <ref type="bibr" target="#b25">[26]</ref> Detecting trends from the sociologists' point of view is an analytical method for observing changes in people's behavior over time with regard to "six attitudes towards trends" defined as trendsetters, trend followers, early mainstreamers, mainstreamers, late mainstreamers and conservatives (s. "The Diamond-Shaped Trend Model" in <ref type="bibr" target="#b25">[26]</ref>). Consequently, trends are certain patterns of people's behavior and lifestyle that evolved over a focused time interval and the word trend refers to a process of change. Detecting trends from the statistical point of view is based on trend analysis of time-series data regarding two goals of analysis: "modeling time series (i.e. to gain insight into the mechanisms or underlying forces that generate the time series) and forecasting time series (i.e., to predict the future values of the timeseries variables)" (p. 490, <ref type="bibr" target="#b9">[10]</ref>). In this terms, trend refers to the general direction in which a time-series graph, based on numeric data, is moving over a focused interval of time. Detecting trends from text collections refers to the detection of emerging topics in texts. In terms of textual data mining a trend in texts is defined as "a topic area that is growing in interest and utility over time" <ref type="bibr" target="#b12">[13]</ref> whereas topic in terms of Topic Detection and Tracking (TDT) <ref type="bibr" target="#b2">[3]</ref> research is "defined to be a set of news stories that are strongly related by some seminal real world event". All of these points of view on trend detection show the different dimensions of trend analysis research. However, they have one thing in common: observing patterns of changes that are based on certain variables (i.e., people, numbers, words) and lead to a general change-the emerging trend-in the system which is depending on these variables. As already defined in my trend ontology approach <ref type="bibr" target="#b22">[23]</ref>, this research uses trend mining as a general term describing trend detection, trend recognition and trending analysis. It can refer either to the detection of emerging topic areas from text analysis or to the detection of trends based on numeric data analysis as in the case of stock values. However, this work focuses only on textual data available on the Web, i.e. online news and blogs, and on learning this data under inclusion of related background knowledge in order to capture and explain trends. In general, I refer to the "emerging topic areas" (see also Section 4) while using the term trend in texts whereas the objective of mining trend is "to provide an alert that new developments are happening in a specific area of interest in an automated way" <ref type="bibr" target="#b12">[13]</ref>. Interesting approaches have been developed in the field of trend mining on texts (s. following Section) but they are still lacking the integration of expert knowledge in the process of trend recognition. Such knowledge is crucial for the proper trend mining and the lack of methods that integrate expert knowledge is a research gap. This thesis aims at closing this gap. It deals with the trend detection task as with a complex learning task based on learning and recognizing of complex relations and dependencies in given domain regarding the time dimension. I focus on the learning method able to integrate expert knowledge in order to automatically recognize trends in text collections. Considering that "In general, trending analysis of textual data can be performed in any domain that involves written records of human endeavors whether scientific or artistic in nature." <ref type="bibr" target="#b19">[20]</ref> trend mining based on texts is useful for many application domains, i.e. medical diagnosis, opinion mining, market monitoring, stock market analysis, etc., and, regarding the increasing information availability on the Internet with its need for intelligent data analysis, it is becoming more and more important research topic in the recent years. Besides contribution to the Trend Mining research, this thesis can have important impact for Machine Learning, and also for the Semantic Web.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Main questions of the thesis</head><p>Two main research questions are important for this thesis: 1) How to change existing machine learning approaches for trend mining into knowledge integrating learning approaches with regard to the development of the Semantic Web? 2) How to acquire and formalize trend knowledge? Main research projects in the field of trend mining are described in Topic Detec-tion and Tracking (TDT) research <ref type="bibr" target="#b2">[3]</ref> and in Emergent Trend Detection (ETD) <ref type="bibr" target="#b4">[5]</ref>. Regarding relevant work for this thesis, in first I concentrate on the research done in the field of trend mining with a focus on the machine learning algorithms since they seem to be crucial in the automatic trend mining. One of the researches, EAnalyst system described in <ref type="bibr" target="#b14">[15]</ref>, proved that determination and early detection of emerging trends can be retrieved from numeric data as well as from texts. EAnalyst has been designed and implemented as a general architecture for the association of news stories with trends. The system collects hybrid data-financial time series and time-stamped news stories, redescribes time series data into "high-level features", called trends, and aligns each trend with time-stamped news stories. Such news stories serve as training set for learning the language model which determines the statistics of word usage patterns in the stories. This language model, learnt for every trend type, helps to monitor a stream of new incoming news stories. The model processes new news stories due to the learnt hypothesis. Authors define here the task of trend detection as a special case of the Activity Monitoring as introduced by <ref type="bibr" target="#b6">[7]</ref>. This research allows for the general precondition in my thesis: it is possible to automatically recognize trends by analyzing texts. Different from EAnalyst, I do not elaborate on text stream monitoring but focus more on the recognition and comprehension process for trend mining. Emergent Trend Detection (ETD) systems that concern with detection of trends presented in <ref type="bibr" target="#b12">[13]</ref> have been characterized based on the following aspects: input data and attributes, learning algorithms and visualization, that are important for creating a trend analysis system. The most relevant comparison perspective for our work are the learning algorithms. According to the system description in <ref type="bibr" target="#b12">[13]</ref> and regarding the prototypes <ref type="bibr" target="#b26">[27]</ref>[17] <ref type="bibr" target="#b5">[6]</ref>, following learning algorithms have been proven useful for trend mining:</p><p>combined "hypothesis testing"-based methods (Time Mines <ref type="bibr" target="#b23">[24]</ref>) single-pass clustering (New Event Detection <ref type="bibr" target="#b3">[4]</ref>) sequential pattern matching and shape query processing (Patent Miner <ref type="bibr" target="#b15">[16]</ref>[1]) feed-forward, backpropagation NN, c4. <ref type="bibr" target="#b4">5</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>and SVM (Hierarchical Distributed</head><p>Dynamic Indexing <ref type="bibr" target="#b19">[20]</ref>, Wüthrich <ref type="bibr" target="#b26">[27]</ref>) k-NN classifier (Wüthrich <ref type="bibr" target="#b26">[27]</ref>) regression analysis (Wühtrich <ref type="bibr" target="#b26">[27]</ref>) Besides, there are many research works related to trend mining, i.e, trend detection based on a fuzzy temporal profile model <ref type="bibr" target="#b7">[8]</ref>, modeling bursty streams using infinite-state automaton <ref type="bibr" target="#b11">[12]</ref>, finite mixture model for tracking dynamics of topic trends <ref type="bibr" target="#b17">[18]</ref>, and clustering approaches <ref type="bibr" target="#b13">[14]</ref>[3] Concerning both, the trend mining based on texts and enhanced text analysis, there are many related projects on the Internet, scientific and commercial, as well as services that are to some extend relevant for this work: GoogleTrends 1 , BlogPulse 2 , OpenCalais 3 Two interesting research project GIDA (Generic Information-based Decision Assistant) <ref type="bibr" target="#b8">[9]</ref>[2] and its follower, TREMA (Trend Mining, Fusion and Analysis of multimodal Data) <ref type="bibr" target="#b18">[19]</ref>, that concentrated on the fusion of multimodal market data in order to mine trends in financial markets (GIDA, TREMA) and in market research (TREMA) are relevant for this thesis. Several projects that concern themselves with lightweight ontologies and extended vocabularies are relevant for the trend knowledge representation part of this thesis, in particular: ConceptNet<ref type="foot" target="#foot_3">4</ref> and OpenMind<ref type="foot" target="#foot_4">5</ref> of MIT, MoaT<ref type="foot" target="#foot_5">6</ref> , Word-Net<ref type="foot" target="#foot_6">7</ref> , SentiWordNet<ref type="foot" target="#foot_7">8</ref> , Wortschatz Uni Leipzig<ref type="foot" target="#foot_8">9</ref> , DWDS<ref type="foot" target="#foot_9">10</ref> , SKOS <ref type="foot" target="#foot_10">11</ref> , SCOT <ref type="foot" target="#foot_11">12</ref>Regarding relevant work outlined above and according to the two research questions, this research focuses on the development of a semantic learning approach for the automatic trend mining in texts on the Web. It also proposes the use of trend ontology and elaborates on the extreme tagging approach <ref type="bibr" target="#b24">[25]</ref> for knowledge acquisition in the trend mining task. However, the main goal of this work is not to predict stock prices for the stock markets based on news analysis nor to create an artificial trader for market trading based on text analysis. This research is neither about a general trend analysis system and it is not studying the influence of Web news on emerging trends (it doesn't take into account the distinction into trend creator news, trend follower news and mainstream news). General assumptions for this thesis are: context is crucial for successful trend mining, collective associations like user tags from folksonomies enable the creation of context knowledge, statistical learning can be enhanced with background knowledge using knowledge representation approach from Semantic Web.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">General approach</head><p>This thesis is anchored in Information System research and Design Science paradigm <ref type="bibr" target="#b21">[22]</ref> <ref type="bibr" target="#b10">[11]</ref> is the methodology that provides the scientific framework for my research. Two main research issues are in focus of my thesis: knowledgeintegrating learning approach for trend mining based on Machine Learning and the representation of trend knowledge based on Semantic Web approach. Concentrating on them, I create my artefact (in terms of Design Science), test and evaluate my trend mining approach. So far, first of all I did an extensive literature review comparing following general aspects of related projects on trend mining: trend definitions, general trend analysis approaches, applied machine learning methods and document corpora. Regarding this issues I elaborated on a general definition of trends in text (this gives the main setting for defining the learn problem in the next steps). Furthermore, I implemented a static storage, parsing and partially preprocessing of document corpus that consists of about 200000 business news in German language in the time interval 2006-2007. I also elaborated on the trend ontology approach <ref type="bibr" target="#b22">[23]</ref> and on the knowledge acquisition approach using tag tagging <ref type="bibr" target="#b24">[25]</ref>. In the next steps, I have to concentrate on the general description of the learning problem in case of mining trends in texts (what kind of feedback is available, what kind of features should be learnt, how to extract trend labels, what is the feature space and how good separable are different classes, how can the features be extended into semantic features, etc.). While defining the learning problem, I also have to consider the representation of the learning data and the representation of the background knowledge. In general, this thesis elaborates on the idea of semantic learning which is the adoption of inductive learning approach from the Machine Learning with the knowledge representation approach from Semantic Web. The outcome of this thesis is a knowledge integrating method for mining trends in texts which aims at improving the quality of trend mining methods and brings the additional value to the existing methods-the trend explanation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Proposed solution</head><p>At this stage of my work, the solution proposed starts with few important definitions: time window, time slice, burstiness, interestingness, utility and trend indication. Based on them, an exact description of what are trends in text is possible: Definition 1: Time window t window is a time interval in which trends can occur. Furthermore, it can be described as an ordered set of subintervals. t slice 13 is a subinterval of time window. If its starting point lies at t 0 the end point has to lie at t k &lt; t n</p><formula xml:id="formula_0">t window = [t 0 ...t n ] ∧ t slicek = [t 0 ...t k ] t window := {t slicek , ..., t slicen } ∧ |t slicek | = |t slicen | ∧ k, n ∈ N ∧ k &lt; n<label>(1)</label></formula><p>Time slices have the same length.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 2: Burstiness</head><p>In order to distinguish words in the documents of given time slice from the all documents in time window, TFIDF (term frequency inverse document frequency) <ref type="bibr" target="#b20">[21]</ref> function is adapted. The function result for each word says how important is a given word in a given period of time. This is the function to discover the burstiness of words: if there is a word in a given time slice which appears only in the documents of this time slice and not in the whole window (backwards) it could be the so called entry point of a trend.</p><p>burst(w)</p><formula xml:id="formula_1">t window := T F (w,|D|t slice ) * IDF (w,|D|t window )<label>(2)</label></formula><p>IDF (w,|D|t window ) := log |D| t window DF (w) t window whereas |D| is the total number of documents. If the word continues to appear in next time slices, and becomes interesting, the word can become trendindicating. Based on the time component as in Def. 1, trendindication is defined by interestingness and utility as follows: Definition 3: Interestingness Interestingness is defined by the frequency of word w in the time window. This can be expressed for a time slice by the sum of the term frequency of word w in all the documents of given time slice divided by the number of documents in this time slice (scaled by binary logarithm).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>interest(w)</head><formula xml:id="formula_2">t slice = f (w) t slice := log T F (w,Dt slice ) |D| t slice<label>(3)</label></formula><p>For the trendindication it is important to know if the interestingness of a word is rising over time window. As given by formula 1 in Def.1, we define as follows for given time window:</p><formula xml:id="formula_3">interest(w) t window := {f (w) t slice k , f (w) t slice k+1 , . . . , f (w) t slice n }<label>(4)</label></formula><p>expresses increasing interestingness if 14 : </p><formula xml:id="formula_4">f (w) t slice k &lt; f (w)</formula><p>whereas:</p><formula xml:id="formula_6">ratio(t window ) = |t window |</formula><p>is the number of time slices. The definitions above allow for a general description of emerging topics in given time window: emerging topics are in the simplest case the intersection of the trend indicating words (set of all words that at some point in the time window start to have bursty behavior and appear frequently enough to be discovered and rare enough to be important in given time window) with the set of words used as tags in a CTB in this time window. Furthermore, the trend indication allows for automatic labeling of the document corpus and dividing it in trend indicating and neutral documents (regarding the time slices in which the documents appear). However, this is the statistical part of the approach and it focuses only on simple words. At this stage of the thesis tests have to be done in order to prove it useful. Furthermore, I have to elaborate on the inclusion of the background knowledge into the labeling either by applying my trend ontology <ref type="bibr" target="#b22">[23]</ref> approach or tag tagging approach <ref type="bibr" target="#b24">[25]</ref> in order to extend the features into the "real" semantic concepts, which I call statements, and at the same time to reduce the dimension of the feature space.</p><p>As for learning approach I propose to adapt the Bayes learning 15 . The Bayes theorem could be in this case explain in very general way as:</p><formula xml:id="formula_7">P (T |S) = P (S|T )P (T ) P (S)<label>(7)</label></formula><p>P (T |S) is the a posteriori probability of T conditioned on S whereas T is the hypothesis and S a statement. In case of mining trends T says that there is an indication for a trend and P (T |S) reflects the probability that the given statement S will indicate a trend (or that the given statement S is built on trendindicating concepts and therefore indicates a trend). P (T ) and P (S) are the a priori probabilities: over T (any given statement causes trend) and over S (any statement from the training set is trend-indicating), P (S|T ) can be estimated from the given data.</p><p>At this stage of my work, I start the tests for trend feature extraction and continue to elaborate on my solution for integration of background knowledge as well as for proper definition of the learning method.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Evaluation</head><p>The evaluation of my approach is primary based on the evaluation of the model performance which can be conducted using crossvalidation and measured in general by the recall and precision values. For the crossvalidation, the document corpus is divided in i folders and the validation process is repeated i times whereas 15 However, also decision trees (good for vizualization and comprehending of the model) and support vector machines (most reliable classification method) have to be considered in every i-step of the validation the 1 i part of the document corpus is used as a test set while the rest i−1 i stacks are used for building the learning model. If D is the set of documents, |D| is the total number of documents in the set, the precision/recall value are: </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Future Work</head><p>Many research issues are relevant to this thesis. From the information retrieval point of view one of them is for example the research on graph-based representation model for documents and semantic indexing of the document collections. In this stage of the work it is too early to expand the remaining issues.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>p 1</head><label>1</label><figDesc>recall = |D| trendindicating−and−retrieved |D| trendindicating (8) precision = |D| trendindicating−and−retrieved |D| retrievedAlso, for the numeric prediction, the relative absolute error measure can be applied:|p 1 − a 1 | + . . . + |p n − a n | |a 1 − a| + . . . + |a n − a| (, p 2 , .. . p n mean the predicted value for the test instances and a 1 , a 2 , . . . a n the actual values. The formulas above give only an insight into the possible measure ways. The final evaluation depends on the final learning model and should also take into account the knowledge integration part (this could be done i.e. in case of decision trees by additional measure of changes in information gain values).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>t slice k+1 &lt; . . . &lt; f (w) t slice n Definition 4: Utility Utility expresses how popular do users find a given word w in the given time window. I propose to retrieve it by analysing collaborative tagging systems (CTB), i.e. delicious, and estimating the popularity of given word as a tag in the same time window as for the trend estimation. The popularity can be simple described by the number of resources in CTB that in given time window have been tagged with the word w divided by the number of all resources tagged in this time window:</figDesc><table><row><cell cols="2">util(w) t window := log</cell><cell>|R| (tag=w)twindow |R| (tag)twindow</cell><cell>(5)</cell></row><row><cell cols="2">Definition 5: Trend indication</cell><cell></cell></row><row><cell>trendind(w)t window =</cell><cell cols="2">burst(w)t slice k + interest(w)t slice  *  util(w)t window ratio(t window )</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://www.google.de/trends</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">http://www.blogpulse.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">http://www.opencalais.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">http://conceptnet.media.mit.edu/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">http://commons.media.mit.edu/en/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">http://moat-project.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">http://wordnet.princeton.edu/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">http://sentiwordnet.isti.cnr.it/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_8">http://wortschatz.uni-leipzig.de/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_9">http://www.dwds.de/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="11" xml:id="foot_10">http://www.w3.org/2004/02/skos/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="12" xml:id="foot_11">http://scot-project.org/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgments This work has been partially supported by the InnoProfile-Corporate Semantic Web project funded by the German Federal Ministry of Education and Research (BMBF) and the BMBF Innovation Initiative for the New German Länder -Entrepreneurial Regions. The author wants to thank Prof. Robert Tolksdorf and Prof. Abraham Bernstein for their helpful comments on the content of this thesis.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Querying shapes of histories</title>
		<author>
			<persName><forename type="first">Rakesh</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Edward</forename><forename type="middle">L</forename><surname>Wimmers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mohamed</forename><surname>Zait</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1995">1995</date>
			<biblScope unit="page" from="502" to="514" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Events and the causes of events</title>
		<author>
			<persName><forename type="first">Khurshid</forename><surname>Ahmad</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop on Making Money in the Financial Services Industry, at the 6th International Conference on Terminology and Knowledge Engineering</title>
				<editor>
			<persName><forename type="first">Lee</forename><surname>Gillam</surname></persName>
		</editor>
		<meeting>the Workshop on Making Money in the Financial Services Industry, at the 6th International Conference on Terminology and Knowledge Engineering</meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Topic Detection and Tracking: Event-based Information Organization</title>
		<author>
			<persName><forename type="first">James</forename><surname>Allan</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2002">2002</date>
			<publisher>Kluwer Academic Publishers</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">On-line new event detection and tracking</title>
		<author>
			<persName><forename type="first">James</forename><surname>Allan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ron</forename><surname>Papka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Victor</forename><surname>Lavrenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGIR &apos;98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="37" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Survey of Text Mining: Clustering, Classification, and Retrieval</title>
		<author>
			<persName><forename type="first">Michael</forename><surname>Berry</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2004">year = 2004</date>
			<publisher>Springer Science+Business Media, Inc</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Currency exchange rate forecasting from news headlines</title>
		<author>
			<persName><forename type="first">K</forename><surname>Raymond</surname></persName>
		</author>
		<author>
			<persName><surname>Wong Desh Peramunetilleke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings 13th Australasian Database Conference</title>
				<meeting>13th Australasian Database Conference</meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="131" to="139" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Activity monitoring: Noticing interesting changes in behavior</title>
		<author>
			<persName><forename type="first">Tom</forename><surname>Fawcett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Foster</forename><surname>Provost</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining</title>
				<meeting>the Fifth International Conference on Knowledge Discovery and Data Mining</meeting>
		<imprint>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page" from="53" to="62" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Trend detection based on a fuzzy temporal profile model</title>
		<author>
			<persName><forename type="first">Paulo</forename><surname>Félix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Santiago</forename><surname>Fraga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Roque</forename><surname>Marín</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Senén</forename><surname>Barro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">AI in Engineering</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="341" to="349" />
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Gillam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ahmad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ahmad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Casey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Taskaya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">C F</forename><surname>Oliveira</surname></persName>
		</author>
		<author>
			<persName><surname>Manomaisupat</surname></persName>
		</author>
		<title level="m">Economic news and stock market correlation: A study of the uk market</title>
				<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Data Mining Concepts and Techniques</title>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kamber</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2006">2006</date>
			<publisher>Morgan Kaufmann Publishers Inc</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Design science in information systems research</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">R</forename><surname>Hevner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">T</forename><surname>March</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ram</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">MIS Quarterly</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="75" to="106" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Bursty and hierarchical structure in streams</title>
		<author>
			<persName><forename type="first">Jon</forename><surname>Kleinberg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">KDD &apos;02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining</title>
				<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="91" to="101" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">A Survey of Emerging Trend Detection in Textual Data Mining</title>
		<author>
			<persName><forename type="first">April</forename><forename type="middle">;</forename><surname>Kontostathis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Leon</forename><surname>Galitsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">William</forename><forename type="middle">M</forename><surname>Pottenger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Soma</forename><surname>Roy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Daniel</forename><forename type="middle">J</forename><surname>Phelps</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2003">2003</date>
			<publisher>Springer-Verlag</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Use of term clusters for emerging trend detection</title>
		<author>
			<persName><forename type="first">April</forename><forename type="middle">;</forename><surname>Kontostathis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lars</forename><forename type="middle">E</forename><surname>Holzman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">William</forename><forename type="middle">M</forename><surname>Pottenger</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
	<note type="report_type">Technical report</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Mining of concurrent text and time series</title>
		<author>
			<persName><forename type="first">Victor</forename><surname>Lavrenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Matt</forename><surname>Schmill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dawn</forename><surname>Lawrie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paul</forename><surname>Ogilvie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Jensen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">James</forename><surname>Allan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">proceedings of the 6 th ACM SIGKDD Int&apos;l Conference on Knowledge Discovery and Data Mining Workshop on Text Mining</title>
				<meeting>the 6 th ACM SIGKDD Int&apos;l Conference on Knowledge Discovery and Data Mining Workshop on Text Mining</meeting>
		<imprint>
			<date type="published" when="2000">2000</date>
			<biblScope unit="page" from="37" to="44" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Discovering trends in text databases</title>
		<author>
			<persName><forename type="first">Brian</forename><surname>Lent</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rakesh</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ramakrishnan</forename><surname>Srikant</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1997">1997</date>
			<publisher>AAAI Press</publisher>
			<biblScope unit="page" from="227" to="230" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Newscats: A news categorization and trading system</title>
		<author>
			<persName><forename type="first">Marc-Andre</forename><surname>Mittermayer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gerhard</forename><forename type="middle">F</forename><surname>Knolmayer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on</title>
				<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="volume">0</biblScope>
			<biblScope unit="page" from="1002" to="1007" />
		</imprint>
	</monogr>
	<note>Data Mining</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Tracking dynamics of topic trends using a finite mixture model</title>
		<author>
			<persName><forename type="first">Satoshi</forename><surname>Morinaga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kenji</forename><surname>Yamanishi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</title>
				<meeting>the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining<address><addrLine>Seattle, Washington, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004">August 22-25, 2004. 2004</date>
			<biblScope unit="page" from="811" to="816" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">Olga</forename><surname>Streibel</surname></persName>
		</author>
		<title level="m">Xml-clearinghouse report 17: Xml-technologies and semantic web for trend mining in business applications</title>
				<imprint>
			<publisher>XML-Clearinghouse Project</publisher>
			<date type="published" when="2007">2007</date>
		</imprint>
		<respStmt>
			<orgName>Freie Universitt Berlin</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical report</note>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Detecting emerging concepts in textual data mining</title>
		<author>
			<persName><forename type="first">William</forename><forename type="middle">M</forename><surname>Pottenger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ting-Hao</forename><surname>Yang</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="89" to="105" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">Automatic Text Processing</title>
		<author>
			<persName><forename type="first">Gerard</forename><surname>Salton</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1989">1989</date>
			<publisher>Addison-Wesley</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">The sciences of the artificial</title>
		<author>
			<persName><forename type="first">Herbert</forename><forename type="middle">A</forename><surname>Simon</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1996">1996</date>
			<publisher>MIT Press</publisher>
			<pubPlace>Cambridge, MA, USA</pubPlace>
		</imprint>
	</monogr>
	<note>3rd ed</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Trend ontology for knowledge-based trend mining in textual information</title>
		<author>
			<persName><forename type="first">Olga</forename><surname>Streibel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Malgorzata</forename><surname>Mochol</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">7th International Conference on Internet Technology: New Generations</title>
				<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="1285" to="1288" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Timemines: Constructing timelines with statistical models of word usage</title>
		<author>
			<persName><forename type="first">Russel</forename><surname>Swan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Jensen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">KDD-2000 Workshop on Text Mining</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Extreme tagging: Emergent semantics through the tagging of tags</title>
		<author>
			<persName><forename type="first">Vlad</forename><surname>Tanasescu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Olga</forename><surname>Streibel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ESOE</title>
				<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="84" to="94" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Anatomy of A Trend</title>
		<author>
			<persName><forename type="first">Henrik</forename><surname>Vejlgaard</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008">2008</date>
			<publisher>McGraw-Hill</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Daily prediction of major stock indices from textual www data</title>
		<author>
			<persName><forename type="first">B</forename><surname>Wüthrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Permunetilleke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Leung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lam</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">proceedings of the 4th International Conference on Knowledge Discovery and Data Mining -KDD-98</title>
				<meeting>the 4th International Conference on Knowledge Discovery and Data Mining -KDD-98</meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="364" to="368" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
