<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">The Topics they are a-Changing -Characterising Topics with Time-Stamped Semantic Graphs</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">A</forename><forename type="middle">Elizabeth</forename><surname>Cano</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Knowledge Media Institute</orgName>
								<orgName type="institution">Open University</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yulan</forename><surname>He</surname></persName>
							<email>y.he@cantab.net</email>
							<affiliation key="aff1">
								<orgName type="department">School of Engineering and Applied Science</orgName>
								<orgName type="institution">Aston University</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Harith</forename><surname>Alani</surname></persName>
							<email>h.alani@open.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="department">Knowledge Media Institute</orgName>
								<orgName type="institution">Open University</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">The Topics they are a-Changing -Characterising Topics with Time-Stamped Semantic Graphs</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">76E89786E7BC1090D9F5C48DCB970989</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:08+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>social media</term>
					<term>topic detection</term>
					<term>DBpedia</term>
					<term>concept drift</term>
					<term>feature relevance decay</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>DBpedia has become one of the major sources of structured knowledge extracted from Wikipedia. Such structures gradually re-shape the representation of Topics as new events relevant to such topics emerge. Such changes make evident the continuous evolution of topic representations and introduce new challenges to supervised topic classification tasks, since labelled data can rapidly become outdated. Here we analyse topic changes in DBpedia and propose the use of semantic features as a more stable representation of a topic. Our experiments show promising results in understanding how the relevance of features to a topic changes over time.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Supervised topic classifiers which depend on labelled data can rapidly become outdated since new information regarding these topics emerge. This challenge becomes apparent when applying topic classifiers to streaming data like Twitter. The continuous change of vocabulary -in many cases event-dependent-makes the task of retraining such classifiers with fresh topic-label annotations a costly one. In event-dependent topics not only new lexical features re-characterise the topic but also existing features can potentially become irrelevant to the topic (e.g., Jan25 being relevant to violence in the Egyptian revolution is now less relevant to current representations of the topic violence). In dynamic environments the expectation that the progressive feature drifts of topics to be in the same feature space is not normally met.</p><p>The incorporation of new event-data to a topic representation leads to a linguistic evolution of a topic, but also to a change on its semantic structure. To the best of our knowledge, none of the existing approaches for topic classification using semantic features <ref type="bibr" target="#b3">[4]</ref>[2][5] <ref type="bibr" target="#b6">[7]</ref>, has focused on the epoch-based transfer learning task. In this paper we aim to disseminate our work presented in <ref type="bibr" target="#b0">[1]</ref> by summarising our proposed transfer learning approach for the epoch-based topic classification ot tweets. In <ref type="bibr" target="#b0">[1]</ref> we investigate whether the use of semantic features as opposed to lexical features can provide a more stable representation of a topic. Here we extend our work by representing cross-epoch settings gain in F-measure for both lexical and semantic feature with infographics. This enables us to highlight the relevance of the studied semantic features over the lexical ones.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">Evolving Topics</head><p>DBpedia is periodically updated to incorporate any additions and modification in Wikipedia. This enables us to track how specific resources evolve over time, by comparing these resources over subsequent DBpedia editions. For example, changes to the semantic graph for the concept Barack Obama can be derived from snapshots of this resource's semantic graph from different DBpedia dumps 3 . E.g., in Figure <ref type="figure" target="#fig_0">1</ref>, although some of the triples remain unchanged in consecutive dumps, new triples provide further information on the resource. Changes regarding a resource are exposed both through new semantic features (i.e triples) and new lexical features -appearing on changes in a resource's abstract-. In DBpedia a topic can be represented by the collection of resources belonging to both the main topic (e.g. cat:War) and resources (e.g dbp:Combat assessment) belonging to subcategories (e.g. cat:Military operations) of the main Topic. Therefore a topic's evolution can be easily tracked by tracking changes in existing and new resources belonging to it.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Topic Classification with Time-Stamped Semantic Graphs</head><p>In <ref type="bibr" target="#b0">[1]</ref>, we propose a novel transfer learning <ref type="bibr" target="#b5">[6]</ref>[3] approach to address the classification task of new data when the only available labelled data belongs to a previous epoch. This approach relies on the incorporation of knowledge from DBpedia graphs. This approach is summarised in Figure <ref type="figure" target="#fig_2">2</ref> and consists of the following stages: 1) Extraction of lexical and semantic features from tweets; 2) Time-dependent content modelling; 3) Strategy for weighting topic-relevant features with DBpedia; and 4) Construction of time-dependent topic classifiers based on lexical, semantic and joint features.</p><p>Our analysis involves the use of two main feature types: lexical and semantic features. The semantic features consist on Class, Property, Category, and Resource. The semantic feature representation of a document therefore is build upon the collection of such features derived from the document's entities mapped to a DBpedia resource. The mapping targets the available DBpedia dump when the document was generated. In <ref type="bibr" target="#b0">[1]</ref>, we proposed different weightening strategies some of which made use of graph properties of a Topic in a DBpedia graph. Such strategies incorporated statistics of the topic graph representation considering a DBpedia graph at time t.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Construction of Time-Dependent Topic Classifiers</head><p>We focus on the binary topic classification in epoch-based scenarios, where the classifier that we train on a corpus from epoch t − 1, is tested on a corpus on epoch t.  analysis targeted our hypothesis that, as opposed to lexical features which are situationdependent and can change progressively in time, semantic structures -including ontological classes and properties -can provide a more stable representation of a Topic. Following the proposed weighting strategies the semantic feature representations of the t − 1 corpus and the t corpus, are both generated from the DBpedia graph available at t − 1. For example when applying a classifier trained on data from 2010, the feature space of a target test set from 2011 is computed based on the DBpedia version used for training the 2010-based classifier. This is in order to simulate the availability of resources in a DBpedia graph at a given time. The semantic feature f in a document x is weighted based on the frequency of a semantic feature f in a document x with Laplace smoothing and the topic-relevance of the feature in the DB t graph:</p><formula xml:id="formula_0">Wx(f )DB t = [ [Nx(f )DB t + 1 |F | + f ∈F Nx(f )DB t ] * (WDB t(f )) 1/2<label>(1)</label></formula><p>where N x (f ) is the number of times feature f appears in all the semantic metagraphs associated with document x derived from the DB t graph ; F is the semantic features' vocabulary of the semantic feature type and W DB t (f ) is the weighting function corresponding to the semantic feature type computed based on the DB t graph. This weighting function captures the relative importance of a document's semantic features against the rest of the corpus and incorporates the topic-relative importance of these features in the DB t graph.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Experiments</head><p>We evaluated our approach using two collections: DBpedia and Twitter datasets. The DBpedia collection comprises four DBpedia dumps (3.6 to 3.9) <ref type="foot" target="#foot_0">4</ref> . The Twitter datasets consist of a collection of Violence-related topics: Disaster Accident, Law Crime and War Conflict. Each of these datasets comprises three epoch-based collections of tweets, corresponding to 2010, 2011, and 2013. The Twitter dataset contained 12,000 annotated tweets <ref type="foot" target="#foot_1">5</ref> . To compare the overall benefit of the use of the proposed weighting strategies against the baselines on this three topics, we averaged the P, R and F-measure of these three cross-epoch settings for each topic. Table <ref type="table">1</ref> presents a summarised version of our results in <ref type="bibr" target="#b0">[1]</ref>, showing only the best performing features. We can see that in average the Class-based semantic features improve upon the bag of words (BoW) features in F measures. This reveals that the use of ontological classes is a more stable option for the representation of a topic. In order to analyse the differences in gain in F measure for each topic in each of the examined features we used the radar plots in Figure <ref type="figure">3</ref>. In this figure a positive value indicates an improvement on the classifier. While semantic features improve upon lexical feature in the three topics, the weighted features for resource, class and category exhibit a positive improvement on these scenarios. Moreover the class based features consistently outperform the BoW in all three topics. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Triples of the Barack Obama resource extracted from different DBpedia dumps (3.6 to 3.8). Each DBpedia dump presents a snapshot in time of factual information of a resource.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>Our</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>3Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Architecture for backtrack mapping of resources to DBpedia dumps and deriving topicrelevance based features for epoch-dependent topic classification.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>BOWTable 1 .Fig. 3 . 4 Conclusions</head><label>134</label><figDesc>Fig. 3. Summary of performance decays for each feature for each Topic on the three cross-epoch scenarios.</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_0">General statistics of these dumps are available at http://wiki.dbpedia.org/Downloads39</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_1">Further information about this dataset is available at<ref type="bibr" target="#b0">[1]</ref> </note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Streching the life of twitter classifiers with time-stamped semantic graphs</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">E</forename><surname>Cano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Alani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ISWC 2014</title>
		<title level="s">Proceedings, Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>Riva del Garda, Trentino, Italy</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2014">Oct 19-23, 2014. 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Harnessing linked knowledge source for topic classification in social media</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">E</forename><surname>Cano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Varga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rowe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ciravegna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 24th ACM Conf. on Hypertext and Social Media (Hypertext)</title>
				<meeting>24th ACM Conf. on Hypertext and Social Media (Hypertext)<address><addrLine>Paris, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title/>
		<author>
			<persName><forename type="first">R</forename><surname>Caruana</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Multitask learning</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="41" to="75" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Discovering context: classifying tweets through a semantic transform based on wikipedia</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Genc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sakamoto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">V</forename><surname>Nickerson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 6th international conference on Foundations of augmented cognition: directing the future of adaptive systems, FAC&apos;11</title>
				<meeting>the 6th international conference on Foundations of augmented cognition: directing the future of adaptive systems, FAC&apos;11<address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="484" to="492" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Incorporating sentiment prior knowledge for weakly supervised sentiment analysis</title>
		<author>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Asian Language Information Processing</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">19</biblScope>
			<date type="published" when="2012-06">June 2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Is learning the n-th thing any easier than learning the first?</title>
		<author>
			<persName><forename type="first">S</forename><surname>Thrun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<publisher>The MIT Press</publisher>
			<date type="published" when="1996">1996</date>
			<biblScope unit="page" from="640" to="646" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Linked knowledge sources for topic classification of microposts: A semantic graph-based approach</title>
		<author>
			<persName><forename type="first">A</forename><surname>Varga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Cano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rowe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ciravegna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Web Semantics: Science</title>
		<imprint>
			<date type="published" when="2014">2014</date>
			<publisher>JWS</publisher>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
