<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Mapping Tweets to Conference Talks: A Goldmine for Semantics</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Milan</forename><surname>Stankovic</surname></persName>
							<email>milan.stankovic@hypios.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Hypios Research</orgName>
								<address>
									<addrLine>187 rue du Temple</addrLine>
									<postCode>75003</postCode>
									<settlement>Paris</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="laboratory">STIH</orgName>
								<orgName type="institution">Université Paris-Sorbonne</orgName>
								<address>
									<addrLine>28 rue Serpente</addrLine>
									<postCode>75005</postCode>
									<settlement>Paris</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Matthew</forename><surname>Rowe</surname></persName>
							<email>m.c.rowe@open.ac.uk</email>
							<affiliation key="aff1">
								<orgName type="department">Knowledge Media Institute</orgName>
								<orgName type="institution">The Open University</orgName>
								<address>
									<addrLine>Milton Keynes</addrLine>
									<postCode>MK7 6AA</postCode>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Philippe</forename><surname>Laublet</surname></persName>
							<email>philippe.laublet@paris-srobonne.fr</email>
							<affiliation key="aff2">
								<orgName type="laboratory">STIH</orgName>
								<orgName type="institution">Université Paris-Sorbonne</orgName>
								<address>
									<addrLine>28 rue Serpente</addrLine>
									<postCode>75005</postCode>
									<settlement>Paris</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Mapping Tweets to Conference Talks: A Goldmine for Semantics</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">6EADED12AFC463DAF39CFE1A7593CB3E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T00:59+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Semantic Web</term>
					<term>Social Web</term>
					<term>Twitter</term>
					<term>User Profiling</term>
					<term>Linked Data</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The short message service Twitter has gained significant popularity and uptake among participants of conferences and organized events as a backchannel for intra-event communication. Information that is exchanged explicitly through such tweets, or that is implicitly present in them, remains mostly hidden and undecipherable to machines. In this paper we propose a framework for extracting valuable information from conference tweets, enabling its publication as Linked Data. We introduce the concept of mapping tweets with the talks and subevents that they refer to, in doing so gaining access to additional information about the users, talks and dynamics of the event. We present preliminary results of our work towards tweet-talk mappings and motivate our current and future work by giving several use cases for such extracted data.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In recent years Twitter 1 communication among participants has become an important element of conferences and participatory events. Rather than whispering during talks, scientists, researchers and professionals of all kinds now use Twitter to pass their thoughts, exchange impressions about the events they are attending, share links and useful information, and interact in real-time, and at some events Twitter is used as a viable medium to collect audience questions. Many conferences now publish a Twitter #hashtag in their official program, allowing the participants to easily follow what other participants are tweeting. Given this context of application, it is clear that a lot of information about an event is captured in Twitter streams. Extracting such information requires techniques able to interpret and leverage content from a tweet, enabling analyses to be performed automatically -given that the scale and uptake of Twitter hinders manual analysis. Furthermore, tweets at conferences often discuss sub-events such as talks, presentations and keynote speeches. The alignment of tweets with such events would enable conference feedback: by assessing the sentiment of tweets and identifying the most popular/unpopular aspects of a conference; and user profiling: by modeling a user based on their area of interest and suggesting talks/users to speak with based on shared contextual information.</p><p>In this paper we propose a solution to mapping tweets with conference talks. Our solution enriches tweets with semantics using the SIOC ontology, identifies DBPedia concepts which the tweets mentions, and maps tweets with talks using an automated technique. We have collected a dataset of tweets produced during the Extended Semantic Web Conference 2010, from which we have preliminary results following the conversion of tweets into triples and identifying DBPedia concepts which the tweets mention. As this paper is presenting a work in progress, we demonstrate the utility of human-provided mappings to highlight the benefits of this data for providing feedback to conference organizers, and to compile user interest profiles.</p><p>We have structured this paper as follows: section 2 presents the key issues associated with the extraction of data from Twitter, explaining our general approach to overcoming such issues. In section 3 we present concrete use-cases that motivate our work on tweet-talk mappings and the exploration of Twitter conference data in general. Section 4 presents our framework for information extraction from Twitter conference archives and for the semantic enrichment of the data. We demonstrate its utility on several concrete examples in Section 5. Section 6 lists some of the related research initiatives, and finally in section 7 we conclude the paper and give future work directions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Issues with Twitter</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Amount of Tweets</head><p>Although all tweets may be considered as a generally useful source of information, the growing rate of new tweets (more then 50 million<ref type="foot" target="#foot_1">2</ref> ) per day combined with Twitter API limitations makes this source almost unprocessable in real-time, therefore requiring the adoption of a strategy to filter out the tweets before processing. Another issue is the amount of tweets that are poor in information content. Those tweets are mostly expressing emotions, phatic communication or similar content. Our approach for overcoming this issue is to focus on tweets that are created during conferences and professional events. Apart from the intuition that those tweets might be richer in meaning, there are studies that support such a choice (e.g., <ref type="bibr" target="#b0">[1]</ref> and <ref type="bibr" target="#b1">[2]</ref>). For instance, authors of <ref type="bibr" target="#b0">[1]</ref> claim that only 15% of conference tweets are of a personal nature, indicating that they must contain fewer phatic communication messages that naturally occur with personal communication. Given the property of containing many relevant tweets, conferences represent a convenient filter for getting to information-rich tweets. Section 3 presents the possible uses of conference tweets and motivates our focus on this particular type of Twitter content.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Swift Disappearing of Tweets</head><p>Due to scalability problems and the enormous amount of tweets that the service hosts, Twitter is providing only a limited access to tweet history. Tweets disappear from public searches after 4-7 days -a time frame that is getting lower and lower, which makes treating past tweets difficult. In order to overcome this issue archiving systems for Twitter have appeared. The most ambitious is Topsy<ref type="foot" target="#foot_2">3</ref> , a service trying to recreate the archive of the most important tweets. Although the archive goes far back in history, it contains only a portion of tweets published in the total superset of the Twittersphere. An alternative approach is offered by Twitter archiving services (e.g., TweetBackup<ref type="foot" target="#foot_3">4</ref> , TwapperKeeper <ref type="foot" target="#foot_4">5</ref> ) and desktop tools (e.g., Archivist<ref type="foot" target="#foot_5">6</ref> , Twinbox <ref type="foot" target="#foot_6">7</ref> ). On TwapperKeeper in particular, once the archive is created for a particular #hashtag, it becomes publically available. We have chosen to rely on Twitter archives for our extraction process, and to use TwapperKeeper because the service contains valuable archives for over 600 conference #hashtags, created by conscious participants.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Use-cases for Twitter Conference Data</head><p>Several use-cases exist that motivate our work for leveraging semantics from conference tweets based on the assumptions that: (1) conference tweets can be accessed; (2) topics of tweets can be identified; (3) conference subevents are provided; and (4) correspondences can be drawn between tweets and conference subevents (e.g., a talk or a workshop) that tweets refer to. Those tweet-talk mappings represent the key point of our approach and make a true difference in the following scenarios.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Digital Memory</head><p>Many users use Twitter to share links and other useful observations during conferences. According to studies in <ref type="bibr" target="#b2">[3]</ref>, information sharing represents an important portion of tweeting activities. Tweets remain as a trace of such exchanges and allow the user to find the shared information later. The conference in question provides a context for the user's memory, allowing information to be retrieved more easily. Furthermore, many tweets from conferences are about a particular talk or subevent that the user has attended, providing an additional level of digital memory.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Sentiment Analysis, Conversational Aspects and Conference Feedback</head><p>Twitter has already been the subject of sentiment analysis research, and general systems for analyzing emotions of tweets have already been proposed <ref type="bibr" target="#b3">[4]</ref>. Since in the case of conference tweets, there may be correspondences between tweets and conference subevents it is clear that such tweets could be used to measure the overall opinion of the talks and other similar presentations, thereby providing useful feedback for presenters and conference organisers alike. Another interesting direction is the study of conversational aspects of tweets, like discussions formed through chains of @replies and the memes formed by retweets -aspects of Twitter that have already been the subject of substantial studies <ref type="bibr" target="#b4">[5]</ref>, <ref type="bibr" target="#b5">[6]</ref>. Intensity of conversations and memes produced around a particular conference talk could help spot the most influential speakers and the most appealing topics. Furthermore a key benefit of this approach is the contrast in implicit and explicit feedback collection: in the latter conference attendees are provided with a form through which they express their perspective on the conference, whereas the former captures the implicit opinion, which is divulged and shared between attendees.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">User Profiling and Expert Finding</head><p>Based on tweet-talk mappings we could easily infer that a user having created a tweet about a given talk, and showed interest in the event, would also be interested in the topic -this is particularly pertinent at conferences given the requirement for the selection of talks amongst parallel tracks. Abstracts of talks are usually available online and contain useful topics that can be understood as identifiers of user interest. In our research we will focus on conferences where metadata is already available in a semantic format on the Semantic Web Dog Food<ref type="foot" target="#foot_7">8</ref> website. We aim to demonstrate that tweet-talk mappings can help us improve the completeness of user interest profiles through the propagation of topics from talks to tweets, and by transitivity, the alignment of users with interests.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Interest Dynamics and Trending Topics</head><p>Based on the previous considerations of tweet-talk mappings and the propagation of topics from talks to tweets, we could easily imagine another use, which is the following of popular topics and the change in their popularity throughout the conference. Relations between topics that are present in DBPedia<ref type="foot" target="#foot_8">9</ref> could make this functionality even more sophisticated by taking into account topic proximity in the DBPedia graph.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Rich Activity Twitter/Event Data (RATED) Framework</head><p>We now describe the central contribution of this paper: a framework, known as RATED, for processing archives of conference tweets and extracting useful eventrelated semantics from a given corpus. Our approach is divided into three distinct stages, as shown in Figure <ref type="figure">1</ref>: the first stage is the production of Linked Data from tweets. This involves taking the basic structure and tweet metadata, and publishing this data as triples using the available ontologies and according to the Linked Data principles -we describe this stage in greater detail the following section. In the second stage of our approach, we enrich the tweets with information about the topics that they discuss. This information is in the form of concepts defined by dereferenceable URIs, this associates a given instance of a tweet with one or more DBPedia concepts -describing the implicit semantics of the tweet is important for being able to search for tweets and analyse the topical dynamics of conferences. We detail this stage in the section 4.2.</p><p>Figure <ref type="figure">1</ref>. The core process of the RATED framework Finally, in the final stage of the approach we address the biggest challenge needed to make most from conference tweets by being able to draw correspondences from a tweet to a particular talk or subevent that the tweet refers to (wherever a tweet refers to a talk or subevent). This is the mapping stage of the approach, where we propose to train a machine learning classifier for a multi-class classification problem, using talk URIs and then use the classifier to label the corpus of tweets. We explain our proposed method for the mapping task within section 4.3. We have applied our approach on the tweets and event data from the recent Extended Semantic Web Conference 2010. Statistics and discussion about the concrete data are provided in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Tweet Data Extraction</head><p>We have built a Java-based parser that can process TwapperKeeper archives in comma separated values format. Once the tweets are imported, we use the official Twitter API <ref type="foot" target="#foot_9">10</ref> to retrieve the user and user account data (e.g., name, biography). The system later saves all this information in a triple store. At present the system is capable of using Jena<ref type="foot" target="#foot_10">11</ref> and Talis<ref type="foot" target="#foot_11">12</ref> triple stores. An example tweet is shown below in the RDF snippet. We have tried to make the most general representation of tweets possible in order to maximise the potential use of published data. Tweets are therefore represented as MicroblogPosts in the sense of the SIOC ontology <ref type="foot" target="#foot_12">13</ref> , the maker of the tweets is described using the FOAF ontology <ref type="foot" target="#foot_13">14</ref> , and the actual act of tweeting is wrapped into an instance of OnlinePresence described using the OPO ontology <ref type="bibr" target="#b6">[7]</ref>. General properties like titles are represented using Dublin Core vocabulary <ref type="foot" target="#foot_14">15</ref> . Apart from tweets the data set contains full descriptions of tweet authors and their Twitter user accounts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Topic Enrichment</head><p>In order to make tweets more searchable and make them useful for the described scenario one would need to have represent them in a structured format and associate those tweets with the topics that refer to. As a basic approach we process the text of the tweets using Zemanta 16 keyword extraction API. This approach gives DBPedia concepts related to a tweet, but is useful only on a limited number of tweets directly mentioning a particular topic. In the example data set we produced from ESWC2010, tweet topics are represented using dc:subject property.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Mapping Tweets to Conference Talks</head><p>Mapping conference tweets to the corresponding parts of conference events performs reference reconciliation: i.e., associating a tweet URI with a talk URI. Our work on an automated mapping solution is still a work in progress, where we regard the problem as a multi-class classification task: given a collection of tweets T our goal is to choose the most appropriate class label from a set of URIs describing the conference events Y. As training data we will use properties and attributes associated with each event URI, by dereferencing each URI and converting the instance description into a bagof-words model. This provide a dataset of the form: D={(x,y)} where X x denotes the collection of features for each URI, and Y y  are the class label for each talk.</p><p>Our intuition is that by converting each tweet into a similar bag-of-words representation as each talk -dereferencing the tweet URI and compiling a set of features -we can then train a classifier using D and classify each tweet, choosing the class label -and therefore URI -with the highest classification confidence. requires a gold standard against which the mappings generated from our automated technique can be compared. In the remainder of this section we describe the process by which we created this gold standard, and to provide contextual evidence of the usefulness of tweet-to-talk mappings, we present several use cases based on humangenerated mappings in the following section. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.1">Gold Standard Creation</head><p>To create a reliable set of correct tweet-talk mappings for evaluating our automatic mapping technique, we conducted human classification of tweets with related talks. We used three evaluators who attended the ESWC2010 conference, to perform the human classification, and took a random subset of 200 tweets for human classification. Each evaluator mapped each tweet with one of the ESWC2010 talks available on the Semantic Web Dog Food website, or in some cases stated that the tweet does not correspond to any talk. The evaluation was done using an iterative process inspired by the Delphi method <ref type="bibr" target="#b7">[8]</ref> to achieve agreement of experts about a certain opinion (most commonly a prediction). Experts give their predictions and justify them. In the second round they read the predictions and justifications of other experts, and can change their mind and thus agree on a common prediction. We used the same process to reach an agreement about tweet-talk mappings. To calculate the agreement levels between raters we used the kappa (k) agreement metric, defined in <ref type="bibr" target="#b8">[9]</ref>, to calculate the evaluator agreement in each phase. The k-statistic measures the chance corrected agreement between raters, using the confusion matrix shown in Table <ref type="table" target="#tab_0">1</ref>. Using these set definitions the k-statistic is calculated using the following formula:</p><formula xml:id="formula_0">) )( ( ) )( ( ) ( 2 b a d b d c c a bc ad        </formula><p>The average agreement between raters following the first iteration of the Delphi method was 0.328, which indicates a very low level of agreement, as 0.6 is generally agreed to be acceptable value (see Table <ref type="table" target="#tab_1">2</ref>. "I round"). We then conducted a second round in which evaluators could see the mappings given by others and change their mind, or argue in favour of their previous mappings. After the second round, a satisfactory level of agreement (k = 0.820) had been reached. Therefore we took mappings from the second round as our gold standard. a tweet is mapped to its talk using s custom defined property http://ontologies.hypios.com/rated#refersTo that is a subproperty of skos:related. significantly. This is expected bearing in mind that the retweet chains which emerge during conferences and the meme phenomenon that is inherent to Twitter.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 2. Histogram of number of tweets per topic when using different types of bindings</head><p>The histogram shown in Figure <ref type="figure">2</ref> visually sums up the above two tables. It shows how the distribution of topics cannot be bound to tweets directly for the tweet text (blue line), and that, conversely, the distribution of topics can be bound to tweets over the tweet-talk mappings (based on talk texts).</p><p>6 Application Scenarios</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Conference Feedback</head><p>It is easy to imagine a system that provides feedback on the most popular topics based on those mined from tweets and also from topics mined from the talks that tweets refer to. In our case, those topics would correspond to those in Table <ref type="table">3</ref> and Table <ref type="table">4</ref>. It is interesting to go further than this simple way of providing feedback by looking at the difference in topics that dominate tweets at a conference, and those that dominate in talks. An example in our case would be the topic Radio-frequency Identification that is popular in tweets, but not represented in talks. Such topics are the ones that people talk about in couloirs but are not represented in the official curriculum. We believe that in some cases such kinds of topic might indicate the need to extend the curriculum of future conferences with the topics people talk about, or detect future trends of interest based on regression analysis.</p><p>Another interesting possibility would be to look at the topics that appear in talks and rank them based on how many times the talks that mention them have been tweeted about. This way we could identify topics that drew attention and those that did not. Because of the incompleteness of our current human mappings we couldn't deliver such an analysis at this stage, but this feature is planned for the next version of our system that will perform the mappings automatically. If we look at the two tables of popular topics it is clear that some topics are similar and may be considered almost the same in a given context (e.g., Semantics and Semantic Web). Therefore it is also interesting to explore the semantic proximity of topic concepts and include this information in the analysis of popularity. This is one of the challenges for our future work. Other future challenges include: associating sentiment information with the tweets and using it for the analysis of talk/presenter popularity, and mining conversations and memes in order to spot the most provocative talks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">Improving User Interest Profiling with Tweet-Talk Mappings</head><p>We believe that tweet-talk mappings might enable us to gain a greater insight into the real interests of users. Since tweets are short, it is difficult to really extract topics from them. On the other hand if we can detect the talks that the tweets refer to, the talk abstracts would enable more substantial topic extraction and thus allow user inference. To demonstrate this, we have used the human-generated mappings to connect users to more topics by propagating topics of the talks to the users over the tweets that serve as a glue. Table <ref type="table" target="#tab_2">5</ref> demonstrates the case for the user Matthew Rowe, for whom we had 4 tweet-talk mappings. By accessing additional information in the form of talk abstracts, additional topics of interest are associated to the user. To gauge the accuracy of this propagation we asked the user to mark those topics that were found to be irrelevant -therefore enabling precision of the topic lists to be measured. For this single participant we yielded precision of 0.714 for topics derived from tweets and 0.632 derived from talks. These results of our first experiments motivate us to continue research in this direction as they show that there might be an additional value of tweet-talk mappings for the user interest profiling. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Related Work</head><p>Twitter has inspired the Social Semantic Web research community to seek methods to extract semantics from tweets. For instance Shinavier <ref type="bibr" target="#b10">[11]</ref> proposes a framework that extracts and unifies the semantics from different nanoformats -syntaxes that allow embedding of meaning and references into short tweet messages. If talks and subevents had their specific references, such syntaxes might be applied for our usecases related to conferences. An inherent disadvantage in comparison to our method is the significant number of references that users have to use to refer to particular subevents, as well as the requirement to enforce their disciplined use.</p><p>Wagner <ref type="bibr" target="#b11">[12]</ref> proposes the KASAS framework for analyzing awareness streams (including Twitter) and the contained concepts. This system is more general and does not intend to generate mappings from tweets to content and real life events in such a contextualized way as we do. Nevertheless this approach could nicely complement our system by providing the functionality that is currently covered by Zemanta, and hopefully generate more related concepts than those from DBPedia. Similarly, Twarql 19 <ref type="bibr" target="#b12">[13]</ref> provides an extraction facility for Twitter streams, providing basic Twitter metadata in RDF, with the addition of rich semantics of Twitter #hashtags. It would be interesting to see how this system could be combined with our tweet-talk mapping in the future. SMOB <ref type="bibr" target="#b13">[14]</ref>, the semantic microblogging tool, represents an alternative to Twitter that publishes tweets directly in semantic (RDF) form. It also provides DBPedia topics for all #hashtags used in the tweet. We could not use SMOB in our experiments because its user base is still insufficient, but in general this tool could be coupled with our tweet-subevent mapping method. Twitter analytic systems like 140kit.com are starting to emerge, but to our knowledge they still do not provide event/conference-oriented services.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">Conclusions and Future Work</head><p>In this paper we have described motivations for extracting structured data from conference tweets. We have presented concrete use cases, and proposed a framework for Twitter conference data extraction. The key concept of our framework is to map tweets to the talks and subevents that they refer to, thereby gaining access to additional knowledge about event dynamics and user activities. Our initial experiments with human-provided tweet-talk mappings indicates that a mappingbased approach will provide significant benefits for the given use-cases (i.e., providing conference feedback, user profiling, digital memory and conference topic dynamics). However, the enormity of the Twittersphere, and the rate at which tweets are published, requires an automatic tweet-talk mapping technique. We have outlined how our technique will work, in an abstract sense, and have defined the task as a multi-class classification problem. The human-generated mappings, produced using the Delphi method, provide the necessary means through which we can assess the performance of our approach. In addition to tweets already published, we plan to make this gold-standard available to the community for use in the future.</p><p>One limitation of our approach is the reliance on existing Twitter archives, and the assumption that Linked Data is available about an event and its subevents. The former assumption is quite realistic, as people are already incentivised to create and share Twitter archives. The latter however is more difficult to achieve, but we believe that our work shows the numerous benefits of utilising Linked Data, and would motivate more event organisers to provide their event data as Linked Data. One also needs to acknowledge that our approach is not limited to the particular service Twitter.com, and is actually generalisable to other microblogging services.</p><p>19 http://twarql.sourceforge.net/</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Rater Agreement Confusion Matrix</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell>Rater 1</cell></row><row><cell></cell><cell></cell><cell>Positive</cell><cell>Negative</cell></row><row><cell>Rater 2</cell><cell>Positive</cell><cell>a</cell><cell>b</cell></row><row><cell></cell><cell>Negative</cell><cell>c</cell><cell>d</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Reaching evaluators' agreement about tweet-talk mappings</figDesc><table><row><cell></cell><cell cols="2">Agreement Function k</cell></row><row><cell></cell><cell>I round</cell><cell>II round</cell></row><row><cell>evaluator1 : evaluator2</cell><cell>0.330</cell><cell>0.905</cell></row><row><cell>evaluator1 : evaluator3</cell><cell>0.307</cell><cell>0.795</cell></row><row><cell>evaluator2 : evaluator3</cell><cell>0.348</cell><cell>0.761</cell></row><row><cell>Average Agreement</cell><cell>0.328</cell><cell>0.820</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 5 .</head><label>5</label><figDesc>Topics from tweets and talks that tweets are about for Matthew Rowe</figDesc><table><row><cell>Topics from tweets</cell><cell>Topics from talks that the tweets are about</cell></row><row><cell>SPARQL</cell><cell>Information</cell></row><row><cell>Linked_Data</cell><cell>Web_application</cell></row><row><cell>Russia</cell><cell>Walled_garden_(technology)</cell></row><row><cell>PageRank</cell><cell>Triplestore</cell></row><row><cell>Japan</cell><cell>Social_web</cell></row><row><cell>IPad</cell><cell>Social_network</cell></row><row><cell>England</cell><cell>Semantic_Web</cell></row><row><cell></cell><cell>Resource_Description_Framework</cell></row><row><cell></cell><cell>Relational_database</cell></row><row><cell></cell><cell>Probabilistic_analysis_of_algorithms</cell></row><row><cell></cell><cell>Privacy</cell></row><row><cell></cell><cell>News_agency</cell></row><row><cell></cell><cell>Nature</cell></row><row><cell></cell><cell>Graph_(mathematics)</cell></row><row><cell></cell><cell>Granularity</cell></row><row><cell></cell><cell>Data</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://www.twitter.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">http://mashable.com/2010/02/22/twitter-50-million-tweets/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">http://topsy.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">http://tweetbackup.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">http://twapperkeeper.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">http://visitmix.com/labs/archivist-desktop/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">http://www.techhit.com/TwInbox/twitter_plugin_outlook.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">http://data.semanticweb.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_8">http://dbpedia.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_9">http://apiwiki.twitter.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="11" xml:id="foot_10">http://jena.sourceforge.net/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="12" xml:id="foot_11">http://www.talis.com/platform/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="13" xml:id="foot_12">http://rdfs.org/sioc/spec/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="14" xml:id="foot_13">http://xmlns.com/foaf/spec/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="15" xml:id="foot_14">http://dublincore.org/documents/dcmi-terms/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="17" xml:id="foot_15">http://www.w3.org/TR/rdf-sparql-query/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgments. The work of Milan Stankovic has been partially funded by ANRT (French National Research Agency) under the grant number CIFRE N 789/2009. We are also grateful to kind people from Talis who have provided us with a free Linked Data triple store to put the data online. Zemanta was also kind to provide us with a generous API call limit that enabled us to do the topic enrichment of the tweets and talks.</p></div>
			</div>


			<div type="availability">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Existing evaluation metrics are applicable in our scenario such as precision and recall, and therefore f-measure with a suitable setting for  . However evaluation 16 http://developer.zemanta.com/ &lt;rdf:Description rdf:about="http://data.hypios.com/tweets/tweet-15162225891"&gt; &lt;dc:description&gt;A tweet :Noshir Contractor at #eswc2010 speaking of datadriven social network analysis of MMORPG such as WoW and EQ2&lt;/dc:description&gt; &lt;rdf:type rdf:resource="http://online-presence.net/opo/ns#OnlinePresence"/&gt; &lt;opo:customMessage&gt; &lt;sioct:MicroblogPost rdf:about="http://data.hypios.com/tweets/tweet-15162225891-cm"&gt; &lt;sioc:content&gt;Noshir Contractor at #eswc2010 speaking of data-driven social network analysis of MMORPG such as WoW and EQ2&lt;/sioc:content&gt; &lt;sioc:id&gt;15162225891&lt;/sioc:id&gt; &lt;dcterms:language&gt;en&lt;/dcterms:language&gt; &lt;foaf:maker rdf:resource="http://data.hypios.com/tweets/user-ciro"/&gt; &lt;dcterms:date&gt;2010-06-01T09:16:46+0200&lt;/dcterms:date&gt; &lt;dcterms:subject rdf:resource="http://dbpedia.org/resource/Social_network"/&gt; &lt;/sioct:MicroblogPost&gt; &lt;/opo:customMessage&gt; &lt;opo:declaredBy rdf:resource="http://data.hypios.com/tweets/user-ciro"/&gt; &lt;opo:startTime&gt;2010-06-01T09:16:46+0200&lt;/opo:startTime&gt; &lt;opo:publishedFrom&gt; &lt;opo:SourceOfPublishing rdf:about="http://data.hypios.com/tweets/tweet-15162225891-source"&gt; &lt;opo:sourceName&gt;http://twitter.com&lt;/opo:sourceName&gt; &lt;/opo:SourceOfPublishing&gt; &lt;/opo:publishedFrom&gt; &lt;/rdf:Description&gt;</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">The Produced Dataset Statistics</head><p>To contextualize the benefits of our approach, we took the TwapperKeeper #eswc2010 hashtag archive for the Extended Semantic Web Conference (ESWC) 2010. This conference was chosen because of our ability to engage with the participants of the conference in the evaluations. All the Twitter data extracted is published in an online data store in accordance with the Linked Data principles <ref type="bibr" target="#b9">[10]</ref>, and a SPARQL 17 endpoint is accessible at the following location: http://data.hypios.com/tweets/sparql. In total we extracted 1082 ESWC tweets. 213 tweets had DBPedia topics directly associated with them using Zemanta, making a total of 252 connections of a tweet with a DBPedia topic. The most used topics are presented in the Table <ref type="table">3</ref>. In our generation of human mappings, we produced a total on 89 mappings between tweets and ESWC talks. This is based on our subset of 200 tweets as explained in the previous section. All of the mappings are available through the above SPARQL endpoint 18 . The remaining 111 tweets were either not related to a talk, or related to a talk/subevent for which there was no information on the Semantic Web Dog Food website (such as panel discussions or informal gatherings). The talks themselves are also enriched with topics (by submitting their abstracts and titles to Zemanta) and the 89 tweet-talk bindings lead to a total of 255 topic bindings, which is more than the total of topic bindings generated using simple tweet text for the whole dataset. Table <ref type="table">4</ref> shows the most used topics that appear with talks from the tweet-talk mappings. Based on these tables as well as the histogram in Figure <ref type="figure">2</ref>, it is clear that topics based on tweet text tend to be slightly more centered, i.e., some topics dominate more</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">How people are using Twitter during conferences</title>
		<author>
			<persName><forename type="first">W</forename><surname>Reinhardt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ebner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Beham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Costa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">5th EduMedia conference</title>
				<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Understanding how Twitter is used to spread scientific messages</title>
		<author>
			<persName><forename type="first">J</forename><surname>Letierce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Passant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Breslin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Decker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Web Science Conference</title>
				<meeting>the Web Science Conference<address><addrLine>Raleigh, NC, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010-04">2010. April 2010</date>
			<biblScope unit="page" from="26" to="27" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Why we twitter: understanding microblogging usage and communities</title>
		<author>
			<persName><forename type="first">A</forename><surname>Java</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Finin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Tseng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis</title>
				<meeting>the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="56" to="65" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Predicting the Future With Social Media</title>
		<author>
			<persName><forename type="first">S</forename><surname>Asur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Huberman</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1003.5699</idno>
		<ptr target="http://arxiv.org/pdf/1003.5699" />
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
	<note type="report_type">Arxiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Tweet, tweet, retweet: Conversational aspects of retweeting on Twitter</title>
		<author>
			<persName><forename type="first">D</forename><surname>Boyd</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Golder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lotan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">hicss-43</title>
				<meeting><address><addrLine>Kauai, HI</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Beyond Microblogging : Conversation and Collaboration via Twitter</title>
		<author>
			<persName><forename type="first">C</forename><surname>Honeycutt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">C</forename><surname>Herring</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Methodology</title>
		<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Modeling Online Presence</title>
		<author>
			<persName><forename type="first">M</forename><surname>Stankovic</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Social Data on the Web Workshop</title>
				<meeting>the First Social Data on the Web Workshop<address><addrLine>Karlsruhe, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="volume">401</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">The Delphi technique as a forecasting tool: issues and analysis</title>
		<author>
			<persName><forename type="first">Wright</forename><surname>Rowe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Forecasting</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">4</biblScope>
			<date type="published" when="1999-10">1999. October 1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Statistical methods for rates and proportions</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Fleiss</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1981">1981</date>
			<publisher>John Wiley</publisher>
			<pubPlace>New York</pubPlace>
		</imprint>
	</monogr>
	<note>2nd ed</note>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Linked Data Desing Principles</title>
		<author>
			<persName><forename type="first">T</forename><surname>Berners-Lee</surname></persName>
		</author>
		<ptr target="http://www.w3.org/DesignIssues/LinkedData.html" />
		<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Real-time #SemanticWeb in &lt;= 140 chars</title>
		<author>
			<persName><forename type="first">J</forename><surname>Shinavier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Linked Data on the Web 2010</title>
				<meeting>Linked Data on the Web 2010</meeting>
		<imprint>
			<publisher>WWW2010</publisher>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Exploring the Wisdom of the Tweets: Towards Knowledge Acquisition from Social Awareness Streams</title>
		<author>
			<persName><forename type="first">C</forename><surname>Wagner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Extended Semantic Web Conference 2010</title>
				<meeting>Extended Semantic Web Conference 2010<address><addrLine>Heraklion, Crete</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Twarql: Tapping into the Wisdom of the Crowd</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">N</forename><surname>Mendes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kapanipathi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Passant</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Triplification Challenge 2010 at 6th International Conference on Semantic Systems (I-SEMANTICS)</title>
				<meeting><address><addrLine>Graz, Austria</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010-09-03">1-3 September 2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">An Overview of SMOB 2: Open, Semantic and Distributed Microblogging</title>
		<author>
			<persName><forename type="first">A</forename><surname>Passant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Bojars</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">G</forename><surname>Breslin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Hastrup</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Stankovic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Laublet</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">4th International Conference on Weblogs and Social Media, ICWSM</title>
				<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="303" to="306" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
