Blog Distillation via Sentiment-Sensitive Link Analysis Giacomo Berardi, Andrea Esuli, Fabrizio Sebastiani, and Fabrizio Silvestri Istituto di Scienza e Tecnologie dell’Informazione Consiglio Nazionale delle Ricerche 56124 Pisa, Italy E-mail: {firstname.lastname}@isti.cnr.it Abstract In this paper we report a new approach to blog distillation, defined as the task in which, given a user query, the system ranks the blogs in descending order of relevance to the query topic. Our approach is based on the idea of adding a link analysis phase to the standard retrieval-by-topicality phase. However, differently from other link analysis methods, we try to analyse whether a given hyperlink is a citation with a positive or a negative nature, i.e., if it expresses approval or disapproval of the linked page by the linking page. We report the results of testing our method on the Blogs08 collection used in the 2008 and 2009 editions of the TREC Blog Track. 1 Introduction Blog distillation is a subtask of the blog search task. It is defined as the task of ranking in decreasing order of relevance the set of blogs in which the topic expressed by the query q is a recurring topic of interest. Blog distillation has mainly been tackled within the TREC Blog track [14, 15], where participants have experimented with various combinations of (i) methods for retrieval by topicality and (ii) sentiment analysis methods. Retrieval by topicality is needed since topi- cality is a key aspect in blog distillation, while sentiment analysis is needed since blogs tend to contain strongly opinionated content, which makes the analysis of opinions (an aspect orthogonal to topic) necessary. In this paper we propose a strategy in which, on top of a standard method for retrieval by topicality, we add a link analysis phase, which is meant to account for the reputation of the blog. Link analysis has been extremely popular in Web search in the late ’90s and early ’00s [4], due to the success of the PageRank algorithm and the Google search engine, which had PageRank at the center of 1 its ranking strategy. However, link analysis has witnessed a somehow decreased interest in the late ’00s, mainly due to the fact that the “link-as-endorsement” hypothesis (i.e., the hypothesis that the presence of a hyperlink denotes an en- dorsement of the linked page on the part of the linking page) is less and less justified in the Web at large, due to the fact that Web pages are now frequently generated by automated Web authoring software, rather than by people. This work is based on the assumption that, unlike in the Web at large, the “link-as-endorsement” hypothesis can still be assumed true in the blogosphere, largely due to the fact that blogs and blog posts are authored by humans. However, due to the highly opinionated nature of blog contents, we must consider that many hyperlinks express disapproval, and not approval, of the linked post on the part of the linking post; as a result, a simplistic link analysis which takes all hyperlinks as expressing approval might actually lead to unintuitive results. We attempt to solve this problem by performing sentiment-sensitive link analysis, i.e., a random-walk process in which links are seen as representing either endorsements (positive) or rebuttals (negative) and are thus weighed according to their sentimental valence (or “polarity”). We establish whether a given hyperlink transmits positive or negative endorsement by performing sentiment analysis on a text window around the hyperlink. This paper is organized as follows. In Section 2 we present our approach to blog distillation. Section 3 presents the results of experiments we have performed on the Blogs08 collection used in the 2008 and 2009 editions of the TREC Blog Track. Finally, Section 4 presents related work. 2 Sentiment-Sensitive Link Analysis Let B = {b1 , . . . , bn } be a collection of blogs, let each blog bi be a set of blog posts bi = {pi1 , . . . , pimi }, and let q be a query (which will typically be expressed as a sequence of words). The relevance of a blog to a query q is typically computed in terms of the relevance of its posts to q. In this paper we propose to compute blog relevance via the combination of a classic method for computing topical relevance (e.g., BM25) and a link analysis method. We first describe our link analysis method, after which we will describe the combination of the scores resulting from the two methods. Let Rq = hβ1 , . . . , βk i be the list of the k top-ranked blogs from a blog col- lection B as returned by a topicality-based retrieval system for query q . Each retrieved blog βi ∈ B is then associated with retrievali , a score returned by the topicality-based retrieval system. In our method, as we shall see in the remaining part of this section, associated with each blog βi ∈ B and with each blog post pij ∈ βi are two link-based scores rblog i and rpost ij , respectively. They are computed on the graph formed by considering blogs (or posts) as vertexes and hyperlinks as edges. Link-based scores are obtained by applying a particular implementa- tion of the random walk with restart (RWR) procedure [21], which mimics the behaviour of a generic blog surfer that reads a post and decides to follow a link 2 only if she is convinced to do so by the text surrounding the link (the anchor text) ; in this case we define a link to be positive. If the anchor text discourages the surfer from following the link, then we define the link to be negative; this is the case, e.g., when the anchor text contains a sharp critique of the linked post. When the anchor text surrounding the link contains neither a clear positive nor a clear negative appreciation of the linked page, then we defined the link as neu- tral ; in this case, the blog surfer has a behaviour typical of the random surfer of “classical” link analysis. To model the behaviour of this blog surfer we have modelled the post (resp., blog) graph G = (V, E) as the union of two graphs, namely G+ = (V + , E + ) and G− = (V − , E − ), where V is the set of nodes representing all the posts (resp., blogs) analysed, and (v1 , v2 ) ∈ E if there exists a link interconnecting the posts (resp., blogs) represented by v1 and v2 . The edges in G are subdivided in two subsets: (i) E + (resp., E − ,) is the set of edges with a polarity greater (resp., less or equal) than a given threshold λ, with V + (resp., V − ) the set of nodes that are endpoints of at least one edge in E + (resp., in E − ). Each edge (vi , vj ) ∈ E, thus also in E + and E − , is weighed by the sentiment associated to that link. By p(vi , vj ) we denote the probability of transition in G+ ; we set p(vi , vj ) = 0 if (vi , vj ) 6∈ E + . We discuss in Section 2.1 the details of how weights are computed according to a sentiment analysis method. Each polarity score sij is turned into a transition probability p(vi , vj ) in G+ , while each weight in G− is equal to the absolute value of sij if (vi , vj ) ∈ E − , 0 otherwise. On the nodes of G+ we compute a r+ [v] value obtained by performing a RWR from the nodes in V + . Restart probabilities are uniformly distributed among the nodes corresponding to posts in the result set R obtained by the baseline retrieval method (as described above). Actually, in order to compute the random walk we apply a refined version of the popular “power method”: (1) we build the matrix M + associated with G+ by naturally considering m+ ij = p(vi , vj ), and (2) + we partition M into K + 1 blocks as described by [11] (the “+1” block contains all the dangling nodes, that are treated separately). See [11] for more details. The final vector of link-based scores r is obtained from r+ by applying the transformation r = r+ · (1 − θ(M − )T r+ ) through the matrix M − associated with G− . In the transformation, θ is a smoothing factor that is used to mitigate the effect of negatively weighed anchor texts. Finally, the overall score of a blog with respect to a query q, is computed according to the following procedure: 1. Let rpost and rblog be the stationary distributions of the post and blog graphs; 2. Retrieve the top-n post set R ranked according to a BM25 score on the query q; 3. Compute a relevance score for each retrieved post j as scorepost j = retrievalj + wrpost [j] 3 where w is the weight assigned to the link analysis score; 4. Finally, compute the relevance score for the a blog bi as !  blog X scorepostj |R ∩ Bi |  blog scorei = + wr [i] j∈B |Bi | |Bi | i where w is again the weight assigned to the link analysis score, and Bi is the set of blog posts of blog i. The top-r scoring blogs are returned for the query q. 2.1 Sentiment-based Edge Weights The RWR methods we propose is based on the assumption that any edge in the graph G is assigned a weight sij that identifies the sentiment-related properties of the links that the edge represents: a positive (or negative) value for sij indicates a positive (or negative) attitude toward the linked document, and the absolute value of sij indicates the intensity of that attitude. In order to automatically assign sentiment scores to links we have built a simple sentiment analysis system that is based on recognizing sentiment patterns in the text surrounding the link (including the text that defines the link). The sentiment pattern recognition component is based on POS tagging the text1 and selecting as candidate patterns all the sequences of words matching the (RB|JJ)+ and NN+ patterns. A sentiment score is assigned to each pattern after asking SentiWordNet [3] to assign a sentiment score to each term in the pattern. The pattern is then checked for the presence of sentiment modifiers, e.g.: • negators (polarity inversion) (e.g., “no”, “not”); • intensifiers and diminishers (e.g., “very”, “strongly”, “barely”, “almost”). As the resource for sentiment modifiers we have used the appraisal lexicon defined in [2]. When a modifier is found, the score of the word that follows it is modified accordingly (e.g., “very good” is assigned a doubly positive score than “good”). The sentiment scores for all the words in a pattern p are summed up, taking into account modifiers, thus resulting in a sentiment score sp for the pattern. All the sentiment scores for patterns that appear in the same sentence of the link are summed up to determine the sentiment-based weight of the link. The sum function takes into account the distance of the pattern from the link, modelling the hypothesis that the closer a sentiment-pattern is to the link, the more it is related to it. Distance is proportional to the discourse function of the words between pattern and link (e.g., “than” divides two subjects in a comparison, it increases the distance), or proportional to their part of speech. 1 For this we have used the Natural Language Toolkit available at http://www.nltk.org 4 The sentiment weight assigned to edge sij is equal to the sentiment score assigned to the relative link. If an edge is related to more than one link in the text, its weight is equal to the sum of the sentiment scores of all the links (if both positive links and negative links are present, they compensate each other). 3 Experiments 3.1 Experimental setting 3.1.1 The dataset We have tested our method on the Blogs08 collection used in the 2008 and 2009 editions of the TREC Blog Track [15]. Blogs08 consists of a crawl of 1,303,520 blogs, crawled from 14 Jan 2008 to 10 Feb 2009. The crawl identified a total of 28,488,766 blog posts, each identified by a unique URL. Each blog post has been effectively downloaded two weeks after its first identification from the crawler, in order to include also a number of comments about the post from the readers of the blog. For our experiments we have followed the protocol of the 2009 Blog Track, using the 50 queries of 2009 and their relevance judgments [14]. 3.1.2 Links graphs We have processed the links in the collection in order to produce two graphs: • a graph of blog posts, in which the nodes of the graph are the blog posts and a directed edge from post x to post y exists iff x contains a link to y; • a graph of blogs, in which the nodes of the graph are the blogs and a directed edge from blog X to blog Y exists iff a post x from blog X contains a link to a homepage of the blog Y . The graph of blog posts contains 4,697,700 nodes (which means that about 60% of the posts are not linked in any way to other posts) and 12,633,788 edges. Considering an undirected version of the graph of blog posts, it is composed of 257,227 connected components, with the largest one composed of 3,985,132 nodes. The graph of blogs is composed of 634,313 nodes (with the largest connected component consisting of 628,670 nodes) and 5,533,981 edges. 3.1.3 Link polarity All links relative to the edges composing the two graphs have been processed by our link polarity detector (see Section 2). The processing of links relative to the graph of blog posts resulted in the identification of 8,947,325 neutral links (i.e., polarity weight equal to zero), 1,906,182 positive links, and 1,780,281 negative links. In case an edge between two nodes is determined by more than one link, the polarity value for the edge is determined as the average of the polarities of 5 Figure 1: Distribution of polarity scores for the edges in the blog posts graph the links. Figure 1 shows the distribution of polarity scores for the edges in the blog posts graph. 3.1.4 Evaluation measures The experimental results have been evaluated by using the two evaluation mea- sure that are typically adopted in the Blog Track, the mean average precision (MAP) and the binary preference (bPref) [1]. MAP is defined as the weighted average of the precision scores obtained P considering the first k results, for all the 1 relevant values of k, i.e., M AP = |R| 1≤k≤|C| rk ·precision(k), where R is the set of documents that are relevant with respect to the query, C is the whole collec- tion, and ri = 1 if the document in position i in the results is relevant with P respect 1 to the query, ri = 0 otherwise. Precision is defined as precision(k) = k 1≤i≤k ri . We will use the notation P@k as a shortcut for precision(k). The bPref measure [1] is proportional to the average number of time in which a non-relevantP document appears in the ranking after a relevant document, i.e., 1 bP ref = |RN | 1≤k≤|C| rk ·|{j | rj = 0, j > k}|, where N is the set of non-relevant documents in the collection with respect to the query. The highest possible value both for MAP and bPref is 1, and it is returned when all the relevant documents are places on top of the non-relevant ones in the ranking. 3.1.5 Experimental protocol As the indexing system for the Blogs08 collection we have used Terrier [18]. With Terrier we have built a baseline for the comparison of our results, based on ranking the documents in the collection by their BM25 relevance with respect to each query. This rankings have been used also as the input to the RWR-based reranking phase of our method. 6 As the damping factor for the RWR method we have used the typical value α = 0.85 [5], we leave to future work the optimization of this parameter. The epsilon value used to stop the iterative process is set to  = 10−9 . We have tested two sets of values for the λ parameters that determine the thresholds for separation of the graphs into the two graphs G+ and G− used by our method: • Λ1 , λ = −0.01 for the blog posts graph, and λ = −0.005 for the blogs graph; • Λ2 , λ = −0.1 for the blog posts graph, and λ = −0.02 for the blogs graph. The Λ1 pairs of value have been set in order to have the 87% of G edges in the G+ graphs. The Λ2 values produce a split in which the 91% of G edges belong to the G+ graphs. For each query, in the collection we retrieve the first million results, ranked by TF-IDF. This million posts are set as the nodes with non-null restart probabilities given as input to the RWR method. We have tested various combinations for the θ, w, and the n parameters: • θ is the smoothing factor that is used to mitigate the effect of negatively weighed anchor texts, we have testes the values 0.1 and 1.0. The first value is chosen to obtain a minimal effect while the second to obtain the maximum effect. • w is the relative weight assigned to the RWR score with respect to the BM25 score assigned by the baseline system. We have tested w = 2, 3, 4, 5. These values have been chosen empirically, we have not obtained better reranking with different weights. • n is the number of posts in the baseline BM25-based ranking that are reranked by using the RWR scores. We have tested n = 1000, 2000, 3000, 4000. These values have been chosen empirically. Adding more not relevant posts can introduce noise in the reranking. 3.2 Results Evaluations with different combination of parameter values does not report big differences. This can been understood in table 2, looking at the differences among best and mean evaluations. The importance of negative scores denoted by θ does not seem relevant to the final ranking; changing the two different values for θ does not bring improvements. We can claim that the contribution of G− is minimal, and the random walk ranking is solid enough to obtain definitive results. Also the graph cut with λ, and the weight parameter w, do not have a sensible effect on results. 7 n 1,000 2,000 3,000 4,000 MAP 0.1775 0.1914 0.1958 0.1949 P@5 0.3347 0.3143 0.2980 0.2939 P@10 0.2878 0.2796 0.2653 0.2592 bPref 0.2039 0.2203 0.2226 0.2210 Table 1: Evaluation of the BM25 baseline. Boldface indicates best results for each evaluation measure. n 1,000 2,000 3,000 4,000 w = 5, Λ2 mean w = 3, Λ2 mean w = 4, Λ1 mean w = 3, Λ1 mean MAP 0.1802 0.1801 0.1943 0.1942 0.1986 0.1983 0.1977 0.1976 P@5 0.3347 0.3347 0.3184 0.3173 0.3020 0.2980 0.2980 0.2959 P@10 0.2898 0.2880 0.2837 0.2819 0.2735 0.2719 0.2673 0.2650 bPref 0.2056 0.2050 0.2222 0.2220 0.2247 0.2242 0.2230 0.2224 Table 2: Best (in terms of MAP plus P@10) and mean evaluations of the Random Walk rankings. Best evaluations are showed with relative parameters; mean evaluations are computed on the different values of w and division of graphs by Λ1 and Λ2 . Results show an improvement over retrieval baseline. This improvement how- ever is not heavily significant, this is due to the sparsity of graphs. Retrieved posts are often not present in the posts graph, so their scores affect a little part of results. Using only blog scores follows better evaluations in some cases. Rising the n parameter increase improvement over baseline (i.e. P@10), be- cause more posts are covered by Random Walk scoring, MAP increases too. Rising n too much introduces less relevant posts in results which decrease im- provements. In order to understand the benefit of using Sentiment Analysis, we have ex- amined a comparison of our Random Walk with a PageRank implementation. PageRank is computed on the whole graph; edges are weighted with uniform probability of transition: 1/outdegree(node). Differences in the evaluations are insignificant, we show an example in table 3. The improvement of PageRank is probably due to the graph coverage of posts, in fact PageRank is calculated on a bigger graph in terms of nodes. Our algorithm however is faster than PageRank, because it works on the reduced transition matrix given by G+ . w 2 3 4 5 6 RWR 0.1981 0.1983 0.1986 0.1983 0.1983 PR 0.1978 0.1983 0.1985 0.1988 0.1987 Table 3: Comparison of MAP evaluations between our Random Walk with Restart (RWR) and PageRank (PR), with θ = 1.0, Λ1 and n = 3, 000. Bold- face indicates best results for each weight. 8 MAP bPref P@10 buptpris [13] 0.2756 0.2767 0.3821 ICTNET [23] 0.2399 0.2384 0.3513 USI [9] 0.2326 0.2409 0.3308 RWR 0.2032 0.2349 0.3103 FEUP [20] 0.1752 0.1986 0.3282 uogTr [16] 0.1317 0.1531 0.2333 UAms [22] 0.0803 0.0966 0.1590 knowcenter [12] 0.0624 0.0742 0.1410 Table 4: Evaluations of participants at TREC Blog Track 2009, ordered by MAP. RWR is our algorithm with n = 3, 000, w = 4, Λ1 , θ = 1. λ iterations time (secs) -0.01 107.8 145.7 posts graph -0.1 107.6 153.9 PageRank on posts graph 107.9 160.5 -0.005 94.5 38.3 blogs graph -0.02 94.1 41.8 PageRank on blogs graph 94.2 47.8 Table 5: Average number of iterations and average time required by our RWR method and PageRank to converge. We finally show a comparison with participants’ approaches at TREC blog track 2009 (table 4). Evaluations are made on the official subset of queries, it consists of 40 topics. For each participant the run with the highest MAP is selected, we have chosen algorithm parameters according to this TREC rule. Blog Track participants have an approach similar to ours in the retrieving phase. They index post documents and the often obtain a baseline ranking, which is reordered with their algorithms. No one takes advantage of link analysis; they use statistical techniques on terms, improve retrieval with query expansion methods, exploit page features like temporal informations. Table 5 reports the average number of iterations and the average time required by our RWR method and PageRank to converge. 4 Related work Blog search has been widely studied in the last years, especially in the context of the TREC Blog Track. The latter consists of different subtasks, one of which is blog distillation, which consists of returning blogs which are relevant by topic to a given query. TREC Blog Track participants have different approaches to the distillation task: most of them do not pay attention to quality- and authority- 9 related aspects of blogs and mainly use IR and statistical techniques, such as language models and/or query expansion [14, 15]. However, quality-related as- pects are important in the blog domain, since content is user-generated and the authoritativeness of authors is thus highly variable [7]. Link analysis in blog distillation. A random walk algorithm has been used in [10] on the TREC Blogs06 collection for the blog distillation task. It is computed on a graph in which vertices are either blogs, or posts, or terms, in order to find relations between blogs and query terms. An edge between a blog and a post shows membership of that post in that specific blog and an edge between a post and a term means the term occurred in that specific post. There is an edge between two posts if they are connected by a hyperlink, or if they are in the same blog. The score attributed to each blog is proportional to the sum of the probabilities of reaching each query term in a predefined number of steps, starting from the blog node. In our work the blog score used for the final ranking is obtained via a linear combination of the score assigned by the random walk to the blog itself and to the posts contained in it. We use sentiment analysis to weight links, while in [10] the transition probabilities are uniform (except for the weights from posts to terms, which are calculated via the tf ∗ idf function). In [17] different scoring strategies are used to rank blog posts. One strategy is to consider post authority by analysing link structure. [17] does not perform a random walk, but implements a similar notion via the use of “post in-degree”. The results of [17] show that post in-degree does not improve accuracy. Our experience has actually been different, since in our experiments blog reranking via random walk always improves accuracy with respect to a standard text retrieval baseline. Studies on the blogosphere are often dedicated to extracting social behaviour. This can be done by using link analysis methods, since authors tend to cite, explicitly or implicitly, other bloggers or articles. [6] creates a graph in which au- thors are linked to their posts and in which also people who have commented on a post are linked to it. [6] presents an algorithm similar to a random walk, since it iteratively obtains post scores by combining the authors’ “authority” and “hub” scores. In our link analysis method instead the only vertices are blogs and blog posts, while the method of [6] also exploits the blogosphere structure, using au- thors as additional vertices and comments as additional hyperlinks. Additionally, [6] performs no analysis on the sentiment associated to hyperlinks. Sentiment-sensitive link analysis. Link analysis of a blog network, with sentiment associated to links, is performed by [8] in order to examine trust prop- agation. The algorithm extracts sentiment from link contexts, then it uses an iterative algorithm and a trust matrix similar to a transition matrix. Trust is propagated according to the concepts of “direct propagation”, “co-citation”, “transpose trust”, and “trust coupling”; a belief matrix is finally computed which represents relationships of trust between bloggers. [8] tests this approach on a network of political blogs, where trust is used to detect, given two political fac- tions, “like-minded” blogs. While both the approach of [8] and our approach 10 are based on sentiment-sensitive link analysis, the goals are different, since [8] attempts to partition a set of blogs according to political orientation, while we attempt to improve the accuracy of blog retrieval. Also the system of [19] uses link polarity in a retrieval application of text documents, with approval/disapproval information. Opinion relations between documents and citations are extracted from the contexts of the citations, using a lexical resource and a syntactic parser, in order to determine opinion polarity of the relations. We have not used syntactic analysis, and we have instead used the distance between the opinionated terms and the anchor text of the link. A user of the system described in [19] can access documents through an interface and can restrict search results to documents cited with a given polarity by other documents; sentiment analysis is thus a tool offered to the final user, and does not affect the retrieval methods and the accuracy. It should also be noted that the sentiment analysis techniques used in both [8] and [19] are less sophisticated than the ones we use. References [1] James Allan, Ben Carterette, and Joshua Lewis. When will information retrieval be “good enough”? In Proceedings of the 28th ACM International Conference on Research and Development in Information Retrieval (SIGIR’05), pages 433–440, Salvador, BR, 2005. [2] Shlomo Argamon, Kenneth Bloom, Andrea Esuli, and Fabrizio Sebastiani. Automatically determining attitude type and force for sentiment analysis. In Hans Uszkoreit and Zygmunt Vetulani, editors, Human Language Technology: Challenges of the Information Society (Revised selected papers from the 3rd Language Technology Conference), pages 218–231. Springer Verlag, Heidelberg, DE, 2009. [3] Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. SentiWordNet 3.0: An en- hanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the 7th Conference on Language Resources and Evaluation (LREC’10), Valletta, MT, 2010. [4] Allan Borodin, Gareth O. Roberts, Jeffrey S. Rosenthal, and Panayiotis Tsaparas. Link analysis ranking: Algorithms, theory, and experiments. ACM Transactions on Internet Technology, 5(1):231–297, 2005. [5] Eric Brill. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics, 21(4):543–565, 1995. [6] Ko Fujimura, Takafumi Inoue, and Masayuki Sugisaki. The EigenRumor algorithm for ranking blogs. In Proceedings of the WWW’05 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, Chiba, JP, 2005. [7] Marti A. Hearst, Matthew Hurst, and Susan T. Dumais. What should blog search look like? In Proceeding of the 2008 ACM Workshop on Search in Social media (SSM’08), pages 95–98, Napa Valley, US, 2008. [8] Anubhav Kale, Amit Karandikar, Pranam Kolari, Akshay Java, Tim Finin, and Anupam Joshi. Modeling trust and influence in the blogosphere using link polarity. In Proceedings of the 1st International Conference on Weblogs and Social Media (ICWSM’07), Boulder, US, 2007. 11 [9] Mostafa Keikha, Mark Carman, Robert Gwadera, Shima Gerani, Ilya Markov, Giacomo Inches, Az A. Alidin, and Fabio Crestani. University of Lugano at the TREC 2009 Blog Track. In Proceedings of the 18th Text Retrieval Conference (TREC’09), Gaithersburg, US, 2009. [10] Mostafa Keikha, Mark J. Carman, and Fabio Crestani. Blog distillation using random walks. In Proceedings of the 32nd ACM Conference on Research and Development in Information Retrieval (SIGIR’09), pages 638–639, Boston, US, 2009. [11] Chris P. Lee, Gene H. Golub, and Stefanos A. Zenios. A two-stage algorithm for computing PageRank and multistage generalizations. Internet Mathematics, 4(4):299–327, 2007. [12] Elisabeth Lex, Michael Granitzer, and Andreas Juffinger. Facet classification of blogs: Know-Center at the TREC 2009 Blog Distillation Task. In Proceedings of the 18th Text Retrieval Conference (TREC’09), Gaithersburg, US, 2009. [13] Si Li, Huiji Gao, Hao Sun, Fei Chen, Oupeng Feng, and Sanyuan Gao. A study of faceted blog distillation - PRIS at the TREC 2009 Blog Track. In Proceedings of the 18th Text Retrieval Conference (TREC’09), Gaithersburg, US, 2009. [14] Craig Macdonald, Iadh Ounis, and Ian Soboroff. Overview of the TREC 2009 Blog Track. In Proceedings of the 18th Text Retrieval Conference (TREC’09), Gaithersburg, US, 2009. [15] Craig Macdonald, Rodrygo L. Santos, Iadh Ounis, and Ian Soboroff. Blog Track research at TREC. SIGIR Forum, 44(1):58–75, 2010. [16] Richard McCreadie, Craig Macdonald, Iadh Ounis, Jie Peng, and Rodrygo L. Santos. University of Glasgow at TREC 2009: Experiments with Terrier. In Proceedings of the 18th Text Retrieval Conference (TREC’09), Gaithersburg, US, 2009. [17] Gilad Mishne. Multiple ranking strategies for opinion retrieval in blogs. In Proceedings of the 15th Text Retrieval Conference (TREC’06), Gaithersburg, US, 2006. [18] Iadh Ounis, Gianbattista Amati, Vassilis Plachouras, Ben He, Craig Macdonald, and Christina Lioma. Terrier: A high performance and scalable information retrieval plat- form. In Proceedings of the SIGIR’06 Workshop on Open Source Information Retrieval (OSIR’06), Seattle, US, 2006. [19] Scott S. Piao, Sophia Ananiadou, Yoshimasa Tsuruoka, Yutaka Sasaki, and John Mc- Naught. Mining opinion polarity relations of citations. In Proceedings of the 7th Interna- tional Workshop on Computational Semantics (IWCS’07), pages 366–371, Tilburg, NL, 2007. [20] Cristina Ribeiro and Gabriel David. FEUP at the TREC 2009 Blog Track: Temporal evidence in the Faceted Blog Distillation Task. In Proceedings of the 18th Text Retrieval Conference (TREC’09), Gaithersburg, US, 2009. [21] Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. Fast random walk with restart and its applications. In Proceedings of the 6th International Conference on Data Mining (ICDM’06), pages 613–622, Washington, US, 2006. [22] Wouter Weerkamp, Manos Tsagkias, and Maarten de Rijke. From blogs to news: Identi- fying hot topics in the blogosphere. In Proceedings of the 18th Text REtrieval Conference (TREC’09), Gaithersburg, US, 2009. [23] Xueke Xu, Yue Liu, Hongbo Xu, Xiaoming Yu, Linhai Song, Feng Guan, and Zeying Peng. ICTNET at Blog Track TREC 2009. In Proceedings of the 18th Text REtrieval Conference (TREC’09), Gaithersburg, US, 2009. 12