Two-level message clustering for topic detection in Twitter Georgios Petkos Symeon Papadopoulos Yiannis Kompatsiaris CERTH-ITI CERTH-ITI CERTH-ITI Thessaloniki, Greece Thessaloniki, Greece Thessaloniki, Greece gpetkos@iti.gr papadop@iti.gr ikom@iti.gr The rest of the paper is structured as follows. In Sec- tion 2 we provide a brief overview of existing topic Abstract detection methods. Subsequently, Section 3 presents our approach for treating the different aspects of the This paper presents our approach to the topic challenge. Then, in Section 4 we present a prelimi- detection challenge organized by the 2014 nary evaluation of the overall approach and present SNOW workshop. The applied approach uti- some topics produced by the overall approach and fi- lizes a document-pivot algorithm for topic de- nally Section 5 concludes the paper. tection, i.e. it clusters documents and treats each cluster as a topic. We modify a previ- 2 Related work ous version of a common document-pivot algo- rithm by considering specific features of tweets At a very high level there are three different classes of that are strong indicators that particular sets topic detection methods: of tweets belong to the same cluster. Addi- 1. Document-pivot methods: these approaches clus- tionally, we recognize that the granularity of ter together documents using some measure of topics is an important factor to consider when document similarity, e.g. cosine similarity us- performing topic detection and we also take ing a bag of words representation and a tf -idf advantage of this when ranking topics. weighting scheme. For instance, the approach in [Pet10] is an approach that falls in this class 1 Introduction and uses a incremental, threshold-based cluster assignment procedure. That is, it examines each This paper presents our approach to the topic detec- document in turn, it finds its best match from the tion challenge organized by the 2014 SNOW workshop. already examined documents and either assigns it Details about the challenge and the motivation behind to the same cluster as its best match or initializes it can be found in [Pap14]. The task did not only in- a new cluster, depending on if the similarity to volve topic detection per se, but it also required the the best match is above some threshold or not. development of approaches related to the presentation Documents are compared using cosine similarity of topics: topic ranking, relevant image retrieval, title on tf -idf representations, while a Locality Sensi- and keyword extraction. We present the solutions we tive Hashing (LSH) scheme is utilized in order to applied to each of these problems. Open source imple- rapidly retrieve the best match. A variant of this mentations of most of the methods used are already approach is utilized in this work. available in a public repository1 and the rest will be made available soon. 2. Feature-pivot methods: these approaches cluster together terms according to their cooccurrence Copyright c by the paper’s authors. Copying permitted only patterns. For instance, the algorithm presented for private and academic purposes. in [?] performs a sequence of signal processing op- In: S. Papadopoulos, D. Corney, L. Aiello (eds.): Proceedings of the SNOW 2014 Data Challenge, Seoul, Korea, 08-04-2014, erations on a tf -idf -like representation of term oc- published at http://ceur-ws.org currence through time in order to select the most 1 https://github.com/socialsensor/topic-detection “bursty” terms. Subsequently, the distribution of appearance of the selected terms through time is tion and enrichment: topic ranking, title and keyword modelled using a mixture of Gaussians. Even- extraction, as well as retrieval of relevant tweets and tually, a cooccurrence measure between terms is multimedia. In the following we present the pursued computed using the KL-Divergence of the corre- approaches for each of these problems. sponding distributions and terms are clustered us- ing a greedy procedure based on this measure. 3.1 Pre-processing 3. Probabilistic topic models: these represent the The pre-processing phase of the employed solution in- joint distribution of topics and terms using a gen- volves duplicate item aggregation and language-based erative probabilistic model which has a set of la- filtering. Duplicate item aggregation is carried out tent variables that represent topics, terms, hyper- because tweets posted on Twitter are often either parameters, etc. Probably, the most commonly retweets or copies of previous messages. Thus, it used probabilistic topic model and one that has makes sense, for computational efficiency reasons, to been extended in many ways is LDA [Ble03]. LDA process in subsequent steps only a single copy of each uses hidden variables that represent the per-topic duplicate item, while also keeping the number (and term distribution and the per-document topic dis- ids) of occurrences for each of them. We implemented tribution. A concise review of probabilistic topic this by hashing the text of each tweet and only keep- models can be found in [Ble12]. ing the text of one tweet per bucket. In practice, we observed that we obtained a significant computa- For a more thorough review of existing topic detection tional gain by doing this (the computational cost of methods please see [Aie13]. the hashing procedure is very small). Indicatively, for Two of the most important problems for topic detec- the first test timeslot, the instance of our crawler col- tion are fragmentation and merging of topics. Frag- lected 15,090 tweets and after duplicate removal we mentation occurs when the same actual story / topic is ended up with roughly half of them: 7,546 tweets in represented by many different produced topics. This is particular. It should be also noted that the hashing quite common in document-pivot methods, such as the scheme we utilized did put in the same bucket all ex- one that we build upon (e.g. if the threshold is set too act duplicates but not near-duplicates. For instance, high). Merging is in some sense the opposite of frag- cases when a user copies a message but adds or re- mentation, i.e. it occurs when many distinct topics, moves some characters are not typically captured as not related to each other, are represented by a single duplicates. It is possible though to modify the pre- topic. In the case of document-pivot methods, merging processing so that most such cases are also captured, may occur when the threshold is set too low. In that e.g. one could filter out the “RT” string and the user case, it is possible that the occurrence of terms that mentions and repeat the same hashing procedure or are not important for a topic may result in two docu- one could detect near duplicates using Jaccard simi- ments related to different topics being matched. These larity (using also an inverse index for speed). These merged topics may either be higher level topics of re- options were briefly tested but thorough testing and lated lower level topics or may be mixed topics of lower deployment has been deferred. level topics that are not related to each other, depend- The second step involves language detection. We use ing on the features on which the assignment of tweets a public Java implementation2 , which provided almost to clusters has occurred. The first case may be accept- perfect detection accuracy. As dictated by the chal- able depending on the required granularity of topics, lenge guidelines, we only keep content in the English but the second case is undesirable as it will produce language. This further reduces the number of tweets topics that are inconsistent and of limited use to the that needs to be processed in futher steps. For in- end user. Thus, it is crucial for document-pivot meth- stance, in the first timeslot, after the removal of non- ods to both do the matching based on the important English tweets we end up with 6,359 tweets (from 7,546 textual features and to select the threshold appropri- non-duplicate tweets that were tested). ately. From an end user’s perspective, fragmentation is bad because it results in redundant and overly spe- 3.2 Topic detection cific topics, whereas merging has a much more negative effect as it is quite likely to provide incomprehensible Having a collection of tweets (with duplicates and non- topics. English tweets removed), we now proceed to detect topics in it. In previous work [Aie13], we experimented 3 Approach with all three classes of methods. All present many challenges when applied to a dataset retrieved from The challenge did not only involve topic detection per se; it also involved various aspects of topic presenta- 2 https://code.google.com/p/language-detection/ Twitter. The main reason is that Twitter messages produced by the document-pivot procedure. Addi- are very short. For document-pivot methods this ex- tionally, a second-level cluster may also contain tweets acerbates the problem of fragmentation, as it is more that were not members of a first-level cluster and also likely, at least compared to longer documents, that second-level clusters may be created from tweets that although a pair of messages discusses the same topic, did not belong in first-level clusters. there may not be enough terms present in both of them In practice, by inspection of the results of early ex- to link them. For feature-pivot methods, the problem periments, it turns out that there still is some frag- with short documents is very similar: i.e. in short doc- mentation: some topics are represented by multiple uments it is more likely that all terms that represent second-level clusters. Therefore we seeked ways to re- a topic will not cooccur frequently enough in order duce this fragmentation. to be clustered together. In this work, we opt for a We first experimented with a semantic representation, document-pivot approach, similar to that of [Pet10], utilizing WordNet. In particular, instead of represent- but we modify it in order to take advantage of some ing the documents with a plain bag of words represen- features that can significantly improve the document tation that uses the raw textual features, we tried to clustering procedure. In particular, we recognize two use the synsets of the verbs and nouns in each doc- facts: a) tweets that contain the same URL refer to ument. Such a representation could improve the re- the same topic and b) a tweet and a reply to it refer to sults, since it would introduce some semantics in the the same topic. Therefore, we can immediately cluster document matching procedure and could match doc- together tweets that contain the same URL and we can uments that do not contain the same raw terms. In also cluster together tweets with their replies. Consid- practice, preliminary results showed that this is indeed ering that there will be cases that these initial clusters true; however it is also very likely to have the oppo- will contain tweets that do not contain the same tex- site effect: i.e. topic merging. Eventually, we dropped tual features, we can expect that taking into account the idea of using WordNet features to represent docu- such information should improve the results of a pure ments and pursued a more moderate approach in order document-pivot approach by reducing fragmentation. to deal with fragmentation. Thus, the idea is to perform some first-level grouping This consisted of two things. First, we utilized lem- of items based on the above features, which will sub- matized terms instead of raw terms in order to be able sequently be taken into account as part of a second- to better match terms. We also considered the use of level document-pivot procedure. In order to obtain stemming, but stemming is a much less reliable pro- the first-level grouping, we utilize a Union-Find struc- cess and may introduce false matches. Additionally, ture [Cor09]. Essentially, we create a graph that con- we recognize that some features are more important tains one node for each tweet, connect pairs of tweets than others for text matching. These features include that contain the same URL or that are related by named entities and hashtags. We use a tf -idf repre- a reply and obtain the set of connected components. sentation of documents and we boost the terms that Components that have more than one tweets are the correspond to named entities and hashtags by some first-level groups that we will subsequently use in our constant factor (1.5 in our experiments, later we will second-level clustering procedure. Clearly, a large also examine the effect of using non-constant boost number of tweets, those that are the only members factors). More formally, for the stemmed term j in of a component with a single element, are not put into the ith document we compute the tf -idfij weight as any first-level cluster. Those tweets are not discarded follows: and are also considered in the second-level clustering ( algorithm. j 1.5 × tfij × idf j , if j is entity or hashtag tf -idfi = The algorithm employed for the second-level clustering tfij × idf j , otherwise is similar to that of [Pet10] (i.e. we use an incremen- (1) tal threshold based clustering procedure and LSH for fast retrieval), but has some modifications. We take where tfij is the frequency of the term in the docu- into account the first-level clustering by examining if ment and idf j is the inverse document frequency of each new tweet to be clustered (it is reminded that the term in an independent randomly collected corpus all tweets are examined, either they belong to some (more details on this corpus will be provided later). first-level cluster or not) has been assigned to a first- For lemmatization and named entity recognition we level cluster and if it has, the other tweets from the utilize the Stanford Core NLP library 3 . first-level cluster are immediately assigned to the same Finally, as we mentioned before, the threshold value is second-level cluster (and are not further examined in an important parameter of the process. We opt for a subsequent clustering steps). Thus, all the first-level high threshold (0.9) so that there is no merging, at the clusters become members of the second-level clusters 3 http://nlp.stanford.edu/software/corenlp.shtml cost of some fragmentation (despite the modifications where cfj is the frequency of appearance of the entity that we did to avoid it). As will be shown in a later or hashtag j in the test corpus. This significantly re- section, where we present some empirical results, the duces the number of produced topics (1,345 for the first produced topics are quite clear, meaning that there timeslot whereas 2,669 topics were produced from the is no merging, and come at the appropriate level of second-level clustering for the same timeslot) and by granularity. inspection it appears that it reduces fragmentation a lot. Importantly, merging takes place, but only related 3.3 Ranking topics are merged into clean higher level topics. For example, the algorithm manages to put all documents The challenge required that only 10 topics per times- related to Ukraine to the same cluster. Subsequently, lot are returned. The preliminary tweet grouping step we rank these high-level topics by the number of doc- resulted in a few hundred first-level topics (483 for the uments in the corresponding clusters and link each first timeslot). When we apply the document-pivot second-level topic produced by the initial document- clustering procedure we end up with considerably more pivot procedure to the corresponding high-level topic. second-level topics (2,669 for the first timeslot using a The linking is carried out by finding which high-level threshold of 0.9). Although, as verified by inspection, topic contains the largest number of tweets of each there still is some fragmentation , the number of actual second-level topic. Finally, we rank all second-level topics is quite large. Thus, we need to rank the pro- topics belonging to the same high-level topic accord- duced second-level topics in order to select the most ing to the number of tweets they contain. Eventually, important. we have a two-level clustering, one for high-level topics Initially, we considered to simply rank the topics ac- and one for the low/second-level topics within each of cording to the number of documents they include and them. In order to select the 10 topics to return, we the number of retweets these documents receive. How- apply a simple heuristic procedure with the number ever, we realized that the granularity and hierarchy of of low level topics selected from each high-level topic topics is also important for topic ranking. As already dropping linearly as its rank drops. More specifically, discussed, some topics may be considered as subtopics we apply the following procedure. First we examine of larger topics and it is reasonable that the attention only the top-ranked high-level topic and select a single that a larger topic attracts should affect the ranking low-level topic from it. Then, we examine the top two of related finer topics. For instance, the most popular high-level topics and select one low-level cluster from high-level topic in our corpus are the events in Ukraine each of them and so on until we obtain 10 topics. Of (this was determined in an early exploratory stage of course, selected second-level topics are not reconsid- our study by examining the ratio of likelihood of ap- ered for selection when a high-level topic is revisited pearance – for more details on this likelihood please during the described procedure. Also, in case that see the section on title extraction – of terms in the there are not enough low-level topics in some high- test corpus and in an independent randomly collected level topic we just skip it. corpus, the term “Ukraine” had the highest ratio). It It should also be noted that we attempted to pro- makes sense then that although a topic about some duce high-level topics without additional boosting event in Ukraine may be linked to as many documents of entities or hashtags, either by lowering the simi- as another topic about a concert, however consider- larity threshold or by clustering second-level super- ing the overall attention that the events in Ukraine documents, but both these approaches resulted in received, the Ukraine related topic should be ranked mixed topics. It appears that these mixed topics were higher. formed based on less important textual features which In order to take advantage of this, we apply the fol- are more common across different topics. On the other lowing procedure. We perform a new clustering of the hand, the applied approach of boosting entities and documents, but this time we boost further the weight hashtags in a more aggressive manner did not produce of hashtags and entities. The boost factor is not the any mixed topics and did indeed manage to surface same for each entity and hashtag, instead, it is linear the higher level topics. to its frequency of appearance in the corpus. More formally, tf -idfij weights are computed as follows (cf. 3.4 Title extraction Eq. 1): ( We first split the text of each tweet in the cluster into cf j × tfij × idf j , if j is entity or hashtag sentences to obtain a set of candidate titles. Clearly, tf -idfij = tfij × idf j , otherwise splitting the text into sentences makes sense, as the (2) title has to be a coherent piece of language. To ob- tain sentence separation we again use the Stanford NLP library. Having an initial set of candidate titles, pendent vocabulary as we did for the titles. However, we subsequently compute the Levehnstein distance be- for keyword extraction we are not limited to selecting tween each pair of candidate titles in order to reduce a single candidate, as is the case for title extraction. the number of actual candidates. In the final step we Thus, we need a mechanism for selecting the number rank the candidate titles using both their frequency of top ranked candidate titles. We utilize a “largest and their textual features. The score of the title is the gap” heuristic to do this. That is, after ranking the product of its frequency and the average likelihood of candidate keywords we compute the score difference appearance of the terms that it contains in an indepen- between subsequent candidates, we find the position dent corpus. The likelihood of appearance of a term in the ranked list with the largest difference and select t was obtained using a smoothed estimate in order to all terms until that position. account for terms not appearing in the independent At the final step of the process, we add to the set of corpus: keywords the set of most important entities. These ct + 1 are determined using a similar “largest gap” heuristic p(t) = (3) N +V and we only add them if they do not already appear as part of a phrase in the set of keywords. Finally, it where ct is the count of appearances of t in the in- should be noted that we use the Stanford NLP library dependent corpus, N is the total number of (non- to obtain the noun and verb phrases. However, in- unique) terms in the corpus and V is the vocabulary stead of doing a full parsing of the texts, which would size (larger than the number of unique terms in the be computationally costly, we perform part of speech corpus). The corpus that was utilized to obtain these tagging and apply some heuristic rules to obtain noun estimates was collected by randomly sampling from and verb phrases from part of speech tags. More par- the Twitter streaming API and consisted of 1,954,095 ticularly, we identify sequences of terms consisting only tweets. It should also be noted that removed candi- of nouns, adjectives and possessive endings (e.g. “’s”) dates increase the frequency count of their most sim- as noun phrases and we identify sequences of terms ilar candidate and also that, despite the fact that we consisting only of verbs as verb phrases. do not process duplicate items, the count of duplicates removed for each processed item contributes to the fre- quency of the sentences extracted from it. 3.6 Representative tweets selection The challenge also requires that a number of represen- 3.5 Keyword extraction tative and as much as possible diverse tweets is pro- vided for each topic. The set of related tweets can The keyword extraction process is similar to the title be easily obtained in our approach, since we utilize extraction process. However, instead of complete sen- a document-pivot method. Regarding diversity, the tences, we now examine either noun phrases or verb duplicate removal step that we apply at the first stage phrases. We decided to work with noun phrases and of our processing partly takes care of this requirement. verb phrases instead of unigram terms because they However, there are still some near duplicates that were generally provide a less ambiguous summary of topics. not captured by the duplicate removal step. Addition- In particular, short phrases can be more meaningful, ally, to introduce as much diversity as possible, we regardless of the order that they appear in, as com- make sure that all replies from the topic’s cluster are pared to single terms. For instance, let us consider included in the set of representative tweets and addi- one of the topics in the first timeslot in the test set. tionally we include the most frequent tweets (making That topic is about Ukrainian journalists publishing a sure that the total number of selected tweets is at most number of documents found in president Yanukovich’s 10). house. The set of keywords we produced was: “secret documents”, “Yanukovich ’s estate”, “Ukraine euro- 3.7 Relevant image extraction maidan”, “was trying”, “president ’s estate”. One can see that regardless of the sequence of these phrases, We retrieve relevant images by applying a very simple one can grasp a fairly good idea about the topic. If procedure. In particular, if the tweets associated with however we used single terms, it could be possible, a topic contain the URL of some images, then we find depending on the order of terms, that some of them the most frequent image and return it. Otherwise, we may be incorrectly associated, e.g. “secret” could be issue a query to the Google search API, searching by associated to the term “estate” instead of the term the title of the topic and associate to the topic the first “documents”. image returned. In a few cases, this did not return any Eventually, in order to select the keywords we rank results; then we issue a further query, this time using them according to their frequency in the clustered doc- the most popular keyword. It should be noted though uments and their likelihood of appearance in an inde- that this approach has a limitation: the Google search 4 x 10 appears that all topics are related to a distinct event, 2.5 # tweets (original) which is fairly well represented by both the title and # tweets (after duplicate aggregation) # tweets (after language filtering) the keywords. It should be noted though, that in some # first−level clusters 2 # second−level clusters cases the set of keywords may not be enough by itself to provide a very clear picture of the essence of the 1.5 story. For instance, in the story about the Ukrainian parliament voting to send Yanukovich to Hague, the keyword Hague is missing, although it should be in- 1 cluded. Thus, the keyword extraction process may be improved, by appropriately changing the mechanism 0.5 of automatically selecting the number of phrases and entities to return (across the complete test collection, 0 the minimum number of keywords retrieved was 1, the 0 10 20 30 40 50 60 70 80 90 100 Timeslot maximum was 5 and the average was 2.625). Also, due to the heuristic that we applied to rapidly retrieve Figure 1: Number of tweets before and after duplicate noun and verb phrases, we occasionaly have mixed aggregation and language filtering, as well as the num- noun and verb phrases, e.g. the phrase “Ukrainian ber of first-level and second-level clusters produced parliament votes”. The title on the other hand makes perfect sense and is in all displayed topics (and most API allows only a specific number of queries per day other topics as well) very indicative of the topic. Fi- and thus we had to issue repeated queries for a long nally, the multimedia retrieved are sometimes very rel- period of time in order to obtain results for each image. evant and some times not too much; e.g., for the topic A potentially better option in that respect would be about the cost of Yanukovich’s house, the retrieved to use a different search API, such as Twitter’s. image is the frontpage of some newspaper. 4 Evaluation 5 Conclusions In the following we examine different aspects of the ap- In this paper we presented the approach pursued by plied approach and then we comment on the quality our team for participating to the SNOW 2014 Data of the produced topics. Figure 1 displays the number Challenge. In short, we have utilized a document-pivot of tweets before and after duplicate aggregation and approach, however we have taken advantage of features language filtering, as well as the number of first-level that allow us to improve the quality of the detected and second-level clusters produced. One thing to note clusters. In particular, we have taken advantage of is that for all timeslots there is a significant reduction commonly appearing URLs and of reply relationships on the number of tweets to be clustered after dupli- between tweets, formulating a two-level clustering pro- cate aggregation and language filtering. Additionally, cedure. We have tuned our clustering, so that it pro- for all timeslots there is a number of – typically a few vides a set of topics at the required granularity, i.e. hundred – first-level clusters, each of which contains low level stories rather than high-level topics at the at least two tweets, meaning that we immediately and cost of some fragmentation. In practice, this provided without resorting to any complicated clustering oper- very good topics. Subsequently, we apply a number of ations we have obtained initial clusterings for a signifi- NLP techniques in order to enrich the representation cant part of the tweets to be clustered. The number of of topics: we use sentence splitting for title extraction second-level topics is typically larger though as tweets and we use noun and verb phrase extraction for iden- that did not form first-level clusters also participate tifying key phrases. Additionally, we identify that the in the second-level clustering procedure. It is also in- ranking of some topic should be related to the impor- teresting to note that the computational cost of the tance of any larger topic that it may be linked to and complete procedure for each timeslot is not that high. we apply an appropriate procedure in order to achieve In particular, the complete set of operations (first and a two-level ranking of topics second level clustering, ranking, title and keyword ex- traction as well as relevant image retrieval) took on average 65.33 seconds per timeslot on a machine with Acknowledgments moderate computational resources (Intel Q9300 CPU running at 2.5 GHz and 4GBs of RAM). This work is supported by the SocialSensor FP7 Table 1 presents the ten topics produced by our ap- project, partially funded by the EC under contract proach for the first test timeslot. As a first remark, it number 287975. References [Ble03] D. Blei, A. Ng, M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Re- [Pap14] S. Papadopoulos, D. Corney, L. Aiello. search, 3:993–1022, 2003. SNOW 2014 Data Challenge: Assessing the Performance of News Topic Detection Meth- [Ble12] D. Blei. Probabilistic topic models. Commun. ods in Social Media. Proceedings of SNOW ACM, 55(4):77–84, 2012. 2014 Data Challenge, 2014. [Aie13] L. Aiello, G. Petkos, C. Martin, D. Corney, S. [Pet10] S. Petrović, M. Osborne, V. Lavrenko. Papadopoulos, R. Skraba, A. Goker, I. Kom- Streaming First Story Detection with Appli- patsiaris, A. Jaimes. Sensing trending topics cation to Twitter. HLT: Annual Conference in Twitter. IEEE Transactions on Multime- of the North American Chapter of the Associ- dia, 15(6):1268–1282, Oct 2013. ation for Computational Linguistics, 2010. [Cor09] T. Cormen, C. Leiserson, R. Rivest, C. Stein. [Wen10] J. Weng, E. Lim, J. Jiang, Q. He. Twitter- Introduction to Algorithms, Third Edition. Rank: finding topic-sensitive influential twit- The MIT Press, 2009. terers. Proceedings of the third ACM inter- national conference on Web search and data mining, 2010. Title: Fight for the right to be free! Keywords: Ukraine, madonna, free !! fight fascism Relevant tweet: @Madonna: Fight for the right to be free!! Fight Fascism everywhere! Free Venezuela the Ukraine&Russia #artforfreedom Title: Ukraine’s toppling craze reaches even legendary Russian commander, who fought Napoleon Keywords: Legendary russian commander, Ukraine Relevant tweet: @RT com #Ukraine toppling craze reaches even legendary Russian com- mander,who fought Napoleon http://on.rt.com/izqunf Title: Ukraine parliament votes to send Yanukovych to The Hague Keywords: Ukraine parliament votes, Yanukovych Relevant tweet: #Ukraine parliament votes to send Yanukovych to The Hague Title: Ukraine’s president spent $ 2.3 m on dining room decor, $ 17k tablecloths, $ 1m to water his lawn Keywords: Ukraine ’s president, dining room decor Relevant tweet: #Ukraine’s president spent $2.3M on dining room decor, $17K tablecloths, $1M to water his lawn Title: Journalists in Ukraine are in the process of uploading 1000s of secret documents found at Yanukovich’s estate Keywords: Secret documents, Yanukovich ’s estate, Ukraine euromaidan, was trying, presi- dent ’s estate Relevant tweet: The #YanukovychLeaks is up! Here are the documents recovered at the ousted presidents estate. #Ukraine #euromaidan Title: Mt. Gox takes Bitcoin exchange offline as currency woes mount, does not say when transactions/withdawals will resume Keywords: Gox, bitcoin exchange offline, currency woes Relevant tweet: Mt. Gox takes #Bitcoin exchange offline as currency woes mount, http://fxn.ws/1ppoMGk @joerogan Title: Can’t decide if I want to write this week’s Most Googled Song about Seth Myers Jimmy Fallon or Bitcoin. Thoughts?? Keywords: Bitcoin, Seth Relevant tweet: Can’t decide if I want to write this week’s Most Googled Song about Seth Myers & Jimmy Fallon or Bitcoin... Thoughts?? Title: Syria aid still stalled after UN. Keywords: Melawanlupa syria aid, resolution, stalled Relevant tweet: #MelawanLupa RT #Syria #aid still stalled after #UN. resolution http://reut.rs/1mwaqlh Title: Remarks at today’s UN General Assembly briefing on the Humanitarian Situation in Syria Keywords: Today ’s un general assembly briefing Relevant tweet: Remarks by @AmbassadorPower at today’s UN General Assembly briefing on the Humanitarian Situation in #Syria: http://go.usa.gov/Bt2d Title: Usmnt’s friendly vs Ukraine on March 5 moved to Cyprus, according to Ukraine’s Football Federation. Keywords: Usmnt ’s friendly, Ukraine’s football federation Relevant tweet: #USMNT’s friendly vs Ukraine on March 5 moved to Cyprus, according to Ukraine’s Football Federation. http://foxs.pt/NuDSvq Table 1: The 10 topics produced by our approach for the first timeslot.