<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">How to Stay Up-to-date on Twitter with General Keywords</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Mandy</forename><surname>Roick</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Maximilian</forename><surname>Jenders</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Ralf</forename><surname>Krestel</surname></persName>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="institution">Hasso Plattner Institute Prof</orgName>
								<address>
									<addrLine>Dr.-Helmert-Str. 2-3</addrLine>
									<postCode>14482</postCode>
									<settlement>Potsdam</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department">Digital Insights</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">How to Stay Up-to-date on Twitter with General Keywords</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">3FE92CD0C9DF5CBA4E38DF06A2747F41</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T23:37+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Microblogging platforms make it easy for users to share information through the publication of short personal messages. However, users are not only interested in sharing, but even more so in consuming information. As a result, they are confronted with new challenges when it comes to retrieving information on microblogging platforms. In this paper we present a query expansion method based on latent topics to support users interested in topical information. Similar to news aggregator sites, our approach identifies subtopics to a given query and provides the user with a quick overview of discussed topics within the microblogging platform. Using a document collection of microblog posts from Twitter, we compare the quality of search results returned by our algorithm with a baseline approach and a state-of-the-art microblog-specific query expansion method. We introduce a novel, innovative semi-supervised evaluation strategy based on expert Twitter users. In contrast to existing query expansion methods, our approach can be used to aggregate and visualize topical query results based on the calculated topic models, while achieving competitive results for traditional keyword-based search.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Searching Microblog Posts</head><p>Along with the development of Web 2.0, users have increasingly become content providers. A good example of this trend are microblogging platforms. These platforms allow users to share short text messages, images, or links with interested observers (followers) <ref type="bibr" target="#b4">[5]</ref>. Microblogging platforms, such as Facebook, Tumblr, or Twitter, report constantly increasing numbers of users. According to Twitter's website, e.g., the platform has 284 million active users monthly and 500 million shared microblog posts daily, averaging 6,000 tweets per second. However, not all of Twitter's users share content. 44% of the users have never posted anything 1 . These users are only interested in consuming content, thus filtering and searching microblog posts becomes an increasingly important task.</p><p>In 2011, Twitter's search engine processed about 1.6 billion search queries daily. An analysis of the search behavior <ref type="bibr" target="#b9">[10]</ref> shows that 49% of Twitter users search for timely information, such as trending topics or information related to news, 26% describe an interest in social information about other users, and 36% report a search for specific topics, such as "astronomy". Since then, searching microblog posts has become part of the research agenda. The Text REtrieval Conference<ref type="foot" target="#foot_0">2</ref> (TREC) opened a Microblog track in 2011 addressing a real-time search task on microblogging platforms. In 2014, Twitter expanded its search service to allow users to search for all tweets ever posted <ref type="foot" target="#foot_1">3</ref> .</p><p>In contrast to Web search, searching microblogs displays some characteristic challenges <ref type="bibr" target="#b9">[10]</ref>. To cope with the restricted length of tweets, Twitter users not only use abbreviations and emoticons, but also employ hashtags, which are explicit, user-specified topic markers. Another means to artificially condense information to fit in tweets is using a link to another web page with more information on the topic. Hence, many tweets contain URLs. However, these instruments are user-specified and their quality and usability for search depends on how users adopt them. URLs for instance often link to images or videos, which are difficult to interpret for a machine. The given hashtags are very inconsistent through different spellings and different interpretations of users; "#4YearsAgo5-StrangersBecame5Brothers", "#ThankYou1DYouChangedOurLives", and "#4-YearsOf1D" all refer to the four year anniversary of the band One Direction. For a user who does not follow this content on Twitter every day, it is difficult to pose queries that match the language used in tweets. The massive number of tweets every day constitutes an additional challenge to new users who are interested in an overview of the content on Twitter. To overcome the differences in the language used by users who post tweets and users who pose queries, we introduce a new query expansion approach to allow topic-based searching. This improves the search experience for people searching topical and news-like information on Twitter using rather general keywords such as "politics" or "basketball".</p><p>While many researchers propose query expansion algorithms for microblogging platforms <ref type="bibr" target="#b8">[9]</ref>, <ref type="bibr" target="#b10">[11]</ref>, <ref type="bibr" target="#b0">[1]</ref>, <ref type="bibr" target="#b3">[4]</ref>, none of them deal with the search for specific topics. Currently, Twitter presents search results in a list view showing the content of tweets and their authors, the time that has passed since the tweets were posted, and, if the tweets link to a news page, a short summary of the news page. The ranking is mainly based on exact query term matching, on recency, and on popularity. While query expansion can help to overcome the problems of exact query term matching, topical queries usually include many subtopics that a user might be interested in. Gaining an overview of these results is difficult using ranked lists. Given the fact that Twitter behaves similar to news media <ref type="bibr" target="#b5">[6]</ref>, we propose to use our results for query expansion to cluster tweets about similar topics. An application could display a user interface similar to platforms such as Google News <ref type="foot" target="#foot_2">4</ref> , where individual news articles are aggregated and categorized.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>There are many approaches that use topic models for query expansion in classic information retrieval <ref type="bibr" target="#b12">[13]</ref>, not so many for microblog posts. Yan et al. <ref type="bibr" target="#b11">[12]</ref> present an alternative to LDA specially for short texts: the biterm topic model (BTM). Instead of generating documents, BTM models the generation of biterms (unordered word-pairs that co-occur in short texts) and assumes that each biterm is drawn from one topic. One work similar to ours describes the automatic topicfocused monitor (ATM) <ref type="bibr" target="#b6">[7]</ref>, which is able to monitor tweets relevant to a given topic. While the strength of ATM lies in the monitoring of tweets over time, our search approach selects keywords firsthand and does not need to know the search query in advance for correct sampling.</p><p>Several approaches for query expansion and document expansion have been proposed in the context of the Microblog Track at TREC. For example, Wang et al. <ref type="bibr" target="#b10">[11]</ref> use a query expansion by accessing pseudo-relevance feedback and a document expansion through given URLs that some tweets contain. They use this expansion to break ties between tweets that display the same retrieval score, meaning that only tweets with the same retrieval score are considered. In that context, Wang et al. showed that the expansions did not support the ranking but lead to worse results. Bandyopadhyay et al. <ref type="bibr" target="#b0">[1]</ref> aim to improve weak queries (e.g., short tweets with different spelling and grammar than a regular search query would exhibit) and present a query expansion algorithm which is based on pseudo-relevant web documents. The algorithm transfers the original queries to the Google search API and expands the query with the most frequent terms in the resulting titles and snippets which are returned by the search API. Irrespective of the TREC Conference, Massoudi et al. <ref type="bibr" target="#b8">[9]</ref> developed a retrieval model for queries that contain trending topics. They extend the model by taking quality indicators, like recency and followers, into account as well as a query expansion through co-occurrence of terms. An approach for document expansion has been described by Efron et al. <ref type="bibr" target="#b3">[4]</ref> using a language model which includes a weighted probability for a word given the expanded document. An expansion is achieved by using the document as pseudo-query on the corpus of documents. Liang et al. <ref type="bibr" target="#b7">[8]</ref> use pseudo-relevance feedback query expansion based on language models and employ temporal re-ranking to discover recent but relevant information for a query in microblogs. Topic models have been used by Chua et al. <ref type="bibr" target="#b2">[3]</ref> to extract representative tweets from a stream for event summarization.</p><p>The presented approaches mostly aim to expand the given query to match the language which is used in the short microblog posts <ref type="bibr" target="#b0">[1]</ref>, <ref type="bibr" target="#b10">[11]</ref> or to expand the microblog posts to match the language which is used in a query <ref type="bibr" target="#b10">[11]</ref>, <ref type="bibr" target="#b3">[4]</ref>. In this paper, we concentrate on queries which are intentionally very general and we aim to expand those queries to provide a good overview of the trending subtopics at different levels of granularity. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Topic-Based Query Expansion</head><p>We want to support users searching for general topics, such as "politics" or "Ukraine". To this end, we propose a query expansion approach based on topic modelling. These models are learned on a daily basis from a data set of crawled and preprocessed tweets and are later used to expand user-specified queries. Figure <ref type="figure" target="#fig_0">1</ref> displays our system's architecture. The crawling of tweets and topic model construction is handled offline, while the topic model is being used to expand queries in an online fashion at query time. If a new, unkown query term is used which is not present in our offline-computed topic model, we fall back to standard keyword search. However, this essentially does not happen for our targeted general queries. Furthermore, we address recency and popularity in Twitter indirectly via computing new topic models daily so our model reflects trends accordingly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Topic Model Construction</head><p>We used latent Dirichlet allocation (LDA) <ref type="bibr" target="#b1">[2]</ref> to compute a topic model<ref type="foot" target="#foot_3">5</ref> prior to search, e.g. once a day. The resulting topic model can then be used to infer a topic distribution for a new tweet d, Θ d . Given a query, the most probable topics can be determined using Φ, the topic-worddistributions. Table <ref type="table">1</ref> shows the 10 most probable topics for a one-day topic model together with the probability of the topic given the query word "politics".</p><p>Using LDA, the number of topics K has to be specified in advance. A larger K leads to splitting of topics, allowing for the separation of ambiguous topics. However, if no ambiguous topics are left, homogeneous ones are split up. For the purpose of query expansion, it is important that different topics can be found for a term, and that the topics found are not ambiguous, as this could lead to topic shifts. We evaluated different values for K on a validation set, which is described in Section 4.</p><p>Table <ref type="table">1</ref>: Top 10 topics from October 20, 2014 for the query "politics" p(i|q) 8 most probable stemmed words 0.167 obama ebola tcot speech presid reason net ban 0.155 ukip vote ward tori parti peopl nh elect 0.115 bjp part scienc india modi biblic read congress 0.089 vote elect voter earli blue texa gop todai 0.072 gate gamer gamerg peopl women stop game bulli 0.069 presid indonesia jokowi minist presiden japan russia 0.052 isi turkei kurd koban fight kill iran syria 0.044 ari support pakistan ban stand pti khan wesupportari 0.043 energi price compani tax loan pai servic power 0.036 class question teacher answer write english word learn Query Expansion We are interested in the most probable topics for all words of a query q, i.e., we search for topics i where p(z = i|q) (in the following p(i|q)) is maximal. During Gibbs sampling, we sample values for z for each word w in the vocabulary w. We use these samples of z to estimate p(i|q) with n(i,w) n(w) . In other words, p(i|q) is estimated by the number of times the query words q were assigned to topic i divided by the total number of occurrences of words q in the corpus. Note that, although our test queries only contain a single term, this formulation also holds for queries with multiple words. For the query expansion, we then use the topics' best representatives, i.e., for a topic i the most probable words based on p(w|i) = φ w i . The quality of the query expansion is heavily influenced by the number of topics the query is expanded with, as well as the number of words chosen from each topic for expansion. We optimized these model parameters on a validation set (see Section 4). Best results were achieved setting K, the number of topics, to 200; the number of terms to use for query expansion to 10, and the threshold to include a topic for an expansion to p(i|q) &gt; 0.05. For our example in Table <ref type="table">1</ref> the top 7 topics would be used for query expansion, while the rest are disregarded.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiments</head><p>To assess the ability of our algorithm to retrieve topically relevant tweets, we propose a novel, semi-automatic evaluation strategy that produces high-quality labeled data by utilizing expert Twitter users. In addition, we present some example queries together with the expanded queries based on our topic model as anecdotal evidence demonstrating how our algorithm can help users to get a topical overview of subtopics for a given general query.</p><p>Data Set Most existing annotated data sets are focused on detailed information needs, such as the Tweets2011 corpus used for the TREC Microblog Track 6 . General topical queries are not included. Therefore, we created our own data set with semi-automatic annotations. We chose 2 general topical queries:"sports", "politics" and for each general query 2 more specific ones: "baseball", "basketball", "Ebola", "Ukraine". To find relevant tweets for each of the queries, we handpicked 10 expert twitter users who primarily tweet on the topic corresponding to the query. Together with the relevance of tweets we used popularity and the number of tweets to select these users. For "politics", e.g., these users were: @BBCPolitics, @CNNPolitics, @NicRobertsonCNN, @KevinBohnCNN, @The-WhiteHouse, @politico, @thehill, @HuffPostPol, @CBSPolitics, @BarackObama. We then crawled these users' tweets together with the 1% of general tweets available through the Twitter API. We annotated only our expert users' tweets as relevant for the respective queries, leading to small values in precision, because some tweets marked as non-relevant are actually relevant. Yet, tweets marked as relevant are in large part actually relevant. Thus, we estimate a method's tendency for the actual precisions. We constructed two data sets, one for validation and one for testing. Each set includes a training set of one day of twitter data to learn the topic model and the subsequent day to validate or test (Oct. 21st and Dec. 4th 2014, each 1.4m tweets (1% of all tweets)). On average, our expert users published 196 tweets per query per day.</p><p>Baseline Approach As baseline approach BL, we search for the given queries without query expansion. Similar to Twitter's search engine, we search for the query terms in tweets as well as in linked content using BM25. In contrast to Twitter's search, our ranking is not incorporating recency or popularity.</p><p>Next to the baseline approach, we compare our search results with a competing query expansion algorithm that is designed for microblogging platforms and based on word co-occurance <ref type="bibr" target="#b8">[9]</ref>. It shows improved search results against a standard query expansion with pseudo relevance feedback.</p><p>Topic-Based Approach Our topic-based approach results in a set of expanded queries for each initial query according to our topic model. We set α asymmetric and choose the initial value α i = K • 0.01 for all i ∈ {1, 2, . . . , K}. In contrast to α, we set β symmetric with initial value β i = 0.05. We run Gibbs sampling for 500 iterations. Each topic i in our model that contains the query term q (i.e. p(i|q) &gt; 0.05) forms the basis for one query. To compare our search results with other search algorithms and the baseline, we merge the tweets resulting from each expanded query q into one ranking. We calculate a ranking score sc q (i, d) for each tweet d that was found for a query q. The score depends on the topic (=expanded query) i for which the tweet was found and the tweet d itself. The score combines the probability p(i|q) of the query term q belonging to the topic i, the topic's proportion θ i d for tweet d, and the BM25 score for the tweet BM 25(d):sc q (i, d) = p(i|q) • θ i d + BM 25(d) This score allows to combine the results of all expanded queries for a query term into one ranking, which is needed to compare the precision with other approaches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results</head><p>The results differ from query to query. Mean average precision (MAP) is 0.101 for the baseline approach (BL), 0.152 for the co-occurance-based approach (CB), and 0.152 for the topic-based approach <ref type="foot" target="#foot_5">7</ref> . The co-occurrence-based query expansion and our topic-based approach improve the results decidedly over the baseline. CB outperforms the topic-based approach only for the query "Ukraine", which results in similar MAP scores, see Table <ref type="table" target="#tab_0">2</ref>. Less general queries, such as Ebola, are less likely to benefit from query expansion since most tweets contain the keyword itself, whereas tweets about baseball are much more likely to contain words such as "MLB" instead of the word "baseball".</p><p>The expanded queries give an overview of the topic. The co-occurance-based approach only produces one expanded query, whereas our topic-based approach finds multiple topics for a given keyword and thus can create multiple expanded queries representing subtopics. Table <ref type="table" target="#tab_1">3</ref> shows how our approach identifies different subtopics related to sports: English soccer, injuries, and American sports, while the co-occurance based approach fails to give a good overview and mixes various sports-related terms. The results are similar for the query Ebola. Here our approach identifies a topic related to Ebola in the U.S. vs. Africa.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discussion</head><p>The co-occurance-based expansion is calculated specifically for each query, therefore it benefits from the expansion terms being well suited. Yet, especially for the more general queries, the expanded queries can become ambiguous, i.e., contain more than one specific topic with considerable topic shifts. In contrast to the co-occurance approach, our topic-based approach discovers more relevant terms for a given query. Thus, the focus of the search can transform to a broader topic than the original one. A strength of our topic-based approach is also the flexibility allowing to expand the query with a variable number of topics and visualize the inherent subtopics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion</head><p>We have analyzed the usage of topic models to support general keyword queries in microblog search. We proposed a query expansion method using latent Dirichlet allocation to find relevant tweets and to group them based on latent topic information. Our experiments have shown that our approach outperforms standard keyword-based search and further demonstrated competitive results compared to a state-of-the-art microblog-specific query expansion algorithm. While standard search algorithms do not by default cluster search results, our approach returns tweets from various subtopics and the topics itself can be inspected to a quick overview of what is currently discussed in Twitter related to general keywords. Besides a further, large-scale evaluation, for future work we are interested in the development of topics over time. Since Twitter is a highly dynamic platform, we hope to capture trending subtopics for general keywords by substituting LDA with a dynamic topic model.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 :</head><label>1</label><figDesc>Fig. 1: System architecture</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 2 :</head><label>2</label><figDesc>Average precision for various algorithms for particular queries</figDesc><table><row><cell cols="4">sports baseball basketball politics Ukraine Ebola</cell></row><row><cell>BL 0.0035 0.0033</cell><cell>0.0038</cell><cell>0.0000</cell><cell>0.2730 0.3232</cell></row><row><cell>CB 0.0057 0.2578</cell><cell>0.0106</cell><cell cols="2">0.0158 0.4595 0.1617</cell></row><row><cell>TB 0.0150 0.3175</cell><cell>0.0158</cell><cell cols="2">0.0166 0.3068 0.2403</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3 :</head><label>3</label><figDesc>Example expanded queries for topic-based approach (TB) and cooccurance-based approach (CB)<ref type="bibr" target="#b8">[9]</ref> for queries "sports" and "Ebola"</figDesc><table><row><cell></cell><cell cols="2">sports</cell><cell></cell><cell>Ebola</cell><cell></cell></row><row><cell>CB</cell><cell></cell><cell>TB</cell><cell>CB</cell><cell></cell><cell>TB</cell></row><row><cell>girls</cell><cell cols="4">sports sports sports outbreak ebola</cell><cell>ebola</cell></row><row><cell cols="2">cespedes united</cell><cell cols="4">hurt game americans dallas nigeria</cell></row><row><cell>boston</cell><cell>goals</cell><cell>head football</cell><cell>free</cell><cell>health</cell><cell>free</cell></row><row><cell>football</cell><cell>game</cell><cell cols="3">butt week officially save</cell><cell>big</cell></row><row><cell>betting</cell><cell>score</cell><cell cols="4">error win declared hospital plan</cell></row><row><cell>sports</cell><cell>mufc</cell><cell>vixx state</cell><cell>virus</cell><cell>nurse</cell><cell>reason</cell></row><row><cell>pretend</cell><cell>west</cell><cell>body nba</cell><cell cols="3">nigeria patient declared</cell></row><row><cell>smh</cell><cell cols="2">liverpool button team</cell><cell>health</cell><cell cols="2">care immediate</cell></row><row><cell>females</cell><cell>man</cell><cell>touch play</cell><cell cols="3">ebola disease someone's</cell></row><row><cell cols="6">yahoo manchester work season obama africa capricorn</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">TREC http://trec.nist.gov</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">Twitter https://blog.twitter.com/2014/building-a-complete-tweet-index</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">Google News http://news.google.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">We use Mallet http://mallet.cs.umass.edu</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4">TREC microblog data http://trec.nist.gov/data/tweets</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_5">To create comparable MAP scores, each ranking is restricted to 500 tweets</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Query expansion for microblog retrieval</title>
		<author>
			<persName><forename type="first">A</forename><surname>Bandyopadhyay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ghosh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Majumder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mitra</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">TREC</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="368" to="380" />
			<date type="published" when="2012">2012</date>
			<publisher>NIST</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Latent dirichlet allocation</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Blei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">I</forename><surname>Jordan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="993" to="1022" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Automatic summarization of events from social media</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">C T</forename><surname>Chua</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Asur</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICWSM</title>
				<imprint>
			<publisher>AAAI</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="81" to="90" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Improving retrieval of short texts through document expansion</title>
		<author>
			<persName><forename type="first">M</forename><surname>Efron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Organisciak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Fenlon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGIR</title>
				<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="911" to="920" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The early bird catches the news: Nine things you should know about micro-blogging</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Kaplan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Haenlein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Business Horizons</title>
		<imprint>
			<biblScope unit="volume">54</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="105" to="113" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">What is twitter, a social network or a news media?</title>
		<author>
			<persName><forename type="first">H</forename><surname>Kwak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Moon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">WWW</title>
		<imprint>
			<biblScope unit="page" from="591" to="600" />
			<date type="published" when="2010">2010</date>
			<publisher>ACM</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Towards social data platform: Automatic topicfocused monitor for twitter stream</title>
		<author>
			<persName><forename type="first">R</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">C C</forename><surname>Chang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">VLDB</title>
				<imprint>
			<publisher>VLDB End</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="1966" to="1977" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Exploiting real-time information retrieval in the microblogosphere</title>
		<author>
			<persName><forename type="first">F</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Qiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yang</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2012">2012</date>
			<publisher>ACM</publisher>
			<biblScope unit="page" from="267" to="276" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Incorporating query expansion and quality indicators in searching microblog posts</title>
		<author>
			<persName><forename type="first">K</forename><surname>Massoudi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tsagkias</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>De Rijke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Weerkamp</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ECIR</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="362" to="367" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">#twittersearch: a comparison of microblog search and web search</title>
		<author>
			<persName><forename type="first">J</forename><surname>Teevan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ramage</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Morris</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">WSDM</title>
				<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="35" to="44" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Tie-breaker: A new perspective of ranking and evaluation for microblog retrieval</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Darko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Fang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">TREC. NIST</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">A biterm topic model for short texts</title>
		<author>
			<persName><forename type="first">X</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Cheng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">WWW</title>
				<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="1445" to="1456" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A comparative study of utilizing topic models for information retrieval</title>
		<author>
			<persName><forename type="first">X</forename><surname>Yi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Allan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ECIR</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="29" to="41" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
