=Paper= {{Paper |id=Vol-1150/nutakki |storemode=property |title=Distributed LDA-based Topic Modeling and Topic Agglomeration in a Latent Space |pdfUrl=https://ceur-ws.org/Vol-1150/nutakki.pdf |volume=Vol-1150 |dblpUrl=https://dblp.org/rec/conf/www/NutakkiNABS14 }} ==Distributed LDA-based Topic Modeling and Topic Agglomeration in a Latent Space== https://ceur-ws.org/Vol-1150/nutakki.pdf

Distributed LDA based Topic Modeling and Topic
Agglomeration in a Latent Space

Gopi Chand Nutakki Olfa Nasraoui
Knowlede Discovery & Web Mining Lab Knowlede Discovery & Web Mining Lab
University of Louisville University of Louisville
g0nuta01@louisville.edu olfa.nasraoui@louisville.edu

Behnoush Abdollahi Mahsa Badami
Knowlede Discovery & Web Mining Lab Knowlede Discovery & Web Mining Lab
University of Louisville University of Louisville
b0abdo03@louisville.edu m0bada01@louisville.edu
Wenlong Sun
Knowlede Discovery & Web Mining Lab
University of Louisville
w0sun005@louisville.edu

sentiment detection is performed to partition
the tweets based on polarity, prior to topic
Abstract modeling.

We describe the methodology that we followed
to automatically extract topics corresponding 1 Introduction
to known events provided by the SNOW 2014 The SNOW 2014 challenge was organized within the
challenge in the context of the SocialSensor context of the SocialSensor project1 , which works on
project. A data crawling tool and selected fil- developing a new framework for enabling real-time
tering terms were provided to all the teams. multimedia indexing and search in the Social Web.
The crawled data was to be divided in 96 The aim of the challenge was to automatically extract
(15-minute) timeslots spanning a 24 hour pe- topics corresponding to known events that were pre-
riod and participants were asked to produce a scribed by the challenge organizers. Also provided,
fixed number of topics for the selected times- was a data crawling tool along with several Twitter fil-
lots. Our preliminary results are obtained us- ter terms (syria, ukraine, bitcoin, terror). The crawled
ing a methodology that pulls strengths from data was to be divided in a total of 96 (15-minute)
several machine learning techniques, including timeslots spanning a 24 hour period, with a goal of
Latent Dirichlet Allocation (LDA) for topic extracting a fixed number of topics in each timeslot.
modeling and Non-negative Matrix Factoriza- Only tweets up to the end of the timeslot could be
tion (NMF) for automated hashtag annota- used to extract any topic. In this paper, we focuse on
tion and for mapping the topics into a latent the topic extraction task, instead of input data filter-
space where they become less fragmented and ing, or presentation of associated headline, tweets and
can be better related with one another. In ad- image URL, because this was one of the activities clos-
dition, we obtain improved topic quality when est to the ongoing research [AN12, HN12, CBGN12]
Copyright c by the paper’s authors. Copying permitted only on multi-domain data stream clustering in the Knowl-
for private and academic purposes. edge Discovery & Web Mining Lab at the University of
In: S. Papadopoulos, D. Corney, L. Aiello (eds.): Proceedings Louisville. To extract topics from the tweets crawled
of the SNOW 2014 Data Challenge, Seoul, Korea, 08-04-2014,
published at http://ceur-ws.org 1 SocialSensor: http://www.socialsensor.eu/
Table 1: Description of used variables.
Symbol Description

M Number of documents in collection
W Number of distinct words in vocabulary
N Total number of words in collection
K Number of topics
xdi ith observed word in document d
zdi Topic assigned to xdi
Nwk Count of word assigned to topic
Ndk Count of topic assigned in document
φk Probability of word given topic k
θd Probability of topic given document d
Figure 1: Topic Modeling Framework (sentiment de- α,β Dirichlet priors
tection and hashtag annotation are not shown).
in each time slot, we use a Latent Dirichlet Allocation Algorithm 1 Latent Dirichlet Allocation.
(LDA) based technique. We then discover latent con- Input: A document collection, hyper-parameters α
cepts using Non-negative Matrix Factorization (NMF) and β.
on the resulting topics, and apply hierarchical cluster-
ing within the resulting Latent Space (LS) in order to Output: A list of topics.
agglomerate these topics into less fragmented themes
1. Draw a distribution over topics, θd ∼ Dir(α)
that can facilitate the visual inspection of how the dif-
ferent topics are inter-related. We have also experi- 2. For Each word i in the document:
mented with adding a sentiment detection step prior
to topic modeling in order to obtain a polarity sensitive 3. Draw a topic index zdi ∈ {1, · · · , K}
topic discovery, and automated hashtag annotation to from the topic weights zdi ∼ θd .
improve the topic extraction.
4. Draw the observed word wdi
2 Background from the selected topic, wdi ∼ βzdi
2.1 Latent Dirichlet Allocation
Latent Dirichlet Allocation (LDA) is a Bayesian prob-
ˆ
abilistic model for text documents. It assumes a col-
lection of K topics where each topic defines a multi- p (D|α, β) = ΠM
d=1 p (θd |α)
nomial over the vocabulary, which is assumed to have X
!
been drawn from a Dirichlet process [BNJ03][HBB10]. ΠNd
n=1 p (zdn |θd ) p (wdn |zdn , β) dθd
Given the topics, LDA assumes the generative process zdn
for each document d, shown in Algorithm 1, where the
notation is listed in Table 1. Equation 1 gives the joint The posterior is usually approximated using Markov
distribution of a topic mixture θ, a set of N topics z, Chain Monte Carlo (MCMC) methods or variational
and a set of N words w for parameters α and β. inference. Both methods are effective, but face signif-
icant computational challenges in the face of massive
data sets. For this reason, we concentrated on a dis-
N
Y tributed version of LDA which is summarized in the
p (θ, z, w|α, β) = p (θ|α) p (zn |θ) p (wn |zn , β) (1) next section.
n=1

Integrating over θ and summing over z, we obtain 2.2 Distributed Algorithms for LDA
the marginal distribution of a document [BNJ03]:
It is possible to distribute non-collapsed Gibbs sam-
ˆ ! pling, because sampling of zdi can happen indepen-
X dently given θd and φk , and thus can be done concur-
p (w|α, β) = p (θ|α) ΠN n=1 p (zn |θ) p (wn |zn , β) dθ rently. In a non-collapsed Gibbs sampler, one samples
zn
zdi given θd and φk , and then θd and φk given zdi . If
Taking the product of the marginal probabilities of individual documents are not spread across different
single documents, the probability of a corpus D can processors, one can marginalize over just θd , since θd is
be obtained: processor-specific. In this partially collapsed scheme,
the latent variables zdi on each processor can be con- Algorithm 3 Approximate Distributed LDA
currently sampled where the concurrency is over pro- [NASW09].
cessors. The slow convergence of partially collapsed Input: A list of M documents, x =
and non-collapsed Gibbs samplers (due to the strong {x1 , · · · , xp , · · · , xP }
dependencies between the parameters and latent vari-
ables) has led to devising distributed algorithms for Output: z = {z1 , · · · , zp , · · · , zP }
fully collapsed Gibbs samplers [NASW09][YMM09].
Given M documents and P processors, with ap- 1. Repeat
proximately MP = M P documents, distributed on each 2. For each processor p in parallel do
processor p, the M documents are partitioned into x =
{x1 , · · · , xp , · · · , xP } and z = {z1 , · · · , zp , · · · , zP } be- 3. Copy global counts: Nwkp ← Nwk
ing the corresponding topic assignments, where pro-
cessor p stores xp , the words from documents j = 4. Sample zp locally:
(p − 1) MP + 1, · · · , pMP and zP , the corresponding LDAGibbsItr(xp ,zp ,Ndkp ,Nwkp ,α,β) //
topic assignments. Topic-document counts Ndk are Alg: 2
likewise distributed as Ndkp . The word-topic counts
Nwk are also distributed, with each processor p keep- 5. Synchronize
ing a separate local copy Nwkp . 6. Update global counts:
Algorithm 2 Standard Collapsed Gibbs Sampling.
P
Nwk ← Nwk + p (Nwkp − Nwk )
LDAGibbsItr( |xp |, zp , Ndkp , Nwkp , α, β):
7. Until termination criterion is satisfied
1. For Each d ∈ {1, · · · , M }
2. For Each i ∈ {1, · · · , Ndkp } of the AD-LDA algorithm which can terminate after
a fixed number of iterations, or based on a suitable
3. v ← xdpi , Tdpi ← Ndkpi MCMC convergence metric. The AD-LDA algorithm
samples from an approximation to the posterior distri-
4. For Each j ∈ {1, · · · , Tdkpi } bution by allowing different processors to concurrently
sample topic assignments on their local subsets of the
5. k̂ ← zdpij
data. AD-LDA works well empirically and accelerates
6. Ndkp ← Ndkp − 1, Nwkp ← Nwkp − 1 the topic modeling process.

7. For k = 1 to K 3 Topic Extraction Methodology
8. ρk ← ρk−1 + (Ndkp + α) 3.1 Data Preprocessing
P
× (Nkwp + β) / ( w0 Nw0 k ) + N β The dataset consists of tweets that were acquired from
the Twitter servers by continuous querying using a
9. x ∼ U nif ormDistribution(0, ρk )
wrapper for the Twitter API over a period of 24 hours.
The batch of tweets are acquired in raw JSON2 format.

10. k̂ ← BinarySearch k̂ : ρk̂−1 < x < ρk̂
Various properties of the tweet such as the hashtags,
11. Ndk̂p ← Ndk̂p + 1,Nwk̂p ← Nwk̂p + 1 URLs, creation time, counts for retweets and favorites,
and other user information including the encoding and
12. zdpij ← k̂ language are extracted. The hashtags can provide a
good source for creating discriminating features and
they were folded as terms into the bag of words model
Although Gibbs sampling is a sequential process,
for each tweet where they were present (without the
given the typically large number of word tokens com-
’#’ prefix). The URLs can also later provide a method
pared to the number of processors, the dependence of
to achieve topic summarization.
zij on the update of any other topic assignment zi0 j 0 is
likely to be weak, thus relaxing the sequential sampling 3.2 Topic Extraction Stages
constraint. If two processors are concurrently sam-
pling, but with different words in different documents, The technique assumes a real time streaming data in-
then concurrent sampling will approximate sequential put and is replicated using process calls to the stor-
sampling. This is because the only term affecting the age records containing the tweets. For AD-LDA, each
order ofP the update operations is the total word-topic 2 JSON: JavaScript Object Notation, is a text-based open

counts w Nwk . Algorithm 3 shows the pseudocode standard designed for human-readable data interchange
Figure 2: Dendrogram depicting few clusters’ hierarchy of top-
ics from the initial window. Agglomeration is based on the
cosine similarity. Average-Linkage Agglomerative Hierarchical
Clustering was used. Distance is computed as (1 − similarity).
Refer to the electronic version of the paper for clarity

tweet is considered as a single document. Figure 1
shows the steps performed to extract the topics in each
Figure 3: Portion of dendrogram depicting the clusters’ hier-
window or time slot. The procedure starts with the
archy of topics from the initial window (Number 0). Agglomera-
extraction of key information from the Twitter JSON,
tion is based on the dot product between the topics’ projections
then the tweet text and other properties are used to
on a lower dimensional latent space extracted using NMF with
extract topics. The topic extraction is performed using
kf = 30 factors. Average-Linkage Agglomerative Hierarchical
the following steps:
Clustering was used. Distance is computed as 1 minus similar-
1. The documents are stripped of non-English char- ity. Refer to the electronic version of the paper for clarity.
acters and are converted to lowercase. The stop Bayes classifier [LGD11] trained with labeled tweet
words are retained for the context information (es- samples4 and a set of labeled tokens5 with known sen-
pecially for sentiment detection). timent polarity can be used to extract the sentiment
levels. The tweets are then regrouped based on the
2. Groups of documents are assembled into windows
sentiment level and the topic modeling is applied on
based on their timestamp. A sliding window’s
each group, resulting in topics that are confined to one
width is equal to three consecutive time slots end-
sentiment, as illustrated in Table 2.
ing in the current time slot.

3. AD-LDA technique is performed on a 20 node 5 Topic Agglomeration in a Latent
cluster. From each sliding window iteration a to- Space
tal of 1000 topics are extracted, this higher value
help in extracting finer topics. 5.1 Discovering Latent Factors Among the
Discovered Topics Using Non-negative
4. The topics are ranked based on the proportion of Matrix Factorization (NMF)
tweets assigned to the topic in the given window,
Because initial topic modeling generated a high num-
and then can be clustered/merged together to or-
ber of topics (1000 topics per window), that were fur-
ganize them into more general topic groups.
thermore very sparse in terms of the descriptive terms
The jsoup open source HTML parser3 was used to ex- within them, these topics were hard to interpret and
tract multimedia content such as images and metadata could benefit from a coarser, less fragmented organi-
from the URLs extracted from the tweets. The head- zation. One way to fix this problem was to merge
lines are part of the metadata while the keywords are the topics based on a conceptual similarity by apply-
obtained from the topic modeling itself as the terms ing Non-negative Matrix Factorization (NMF) [LS99].
with highest probability in the topic. Because the topics by words matrix is a very sparse
matrix, we used NMF to project the topics onto a
4 Topic Extraction with AD-LDA common lower-dimensional latent factors’ space. NMF
takes as input the matrix X of n topics by m words
and Sentiment Labels (as binary features) and decomposes it into two fac-
The AD-LDA technique with Gibbs sampling, along tor matrices (A and B) which represent the topics and
with automatically extracted sentiment labels can also words, respectively, in a kf -dimensional latent space,
be used to extract polarity-sensitive topics. Using sen- as follows:
timent labels may improve the quality of the topics as
it results in finer topics. Figure 1 depicts the general Xn×m ' An×kf BT m×kf (2)
flow within the used methodology. A weighted Naive
4 https://github.com/ravikiranj/twitter-sentiment-analyzer
3 http://jsoup.org/ 5 https://github.com/linron84/JST
Positive Sentiment Negative Sentiment
optimistic ukraine antiwar nonintervention horrible building badge hiding ukraine yanukovych
syria refugees about education children million syria yarmouk camp crisis food waiting unrest shocking
future technology bitcoins value law accelerating cnn protocols loss gox bitcoin fault

Table 2: Illustrating a sample of the finer topics extracted after a preliminary sentiment detection phase.
Algorithm 4 Basic Alternating Least Square (ALS)
Algorithm for NMF
Input: Data matrix X, number of factors kf
Output: optimal matrices A and B
1. Initialize matrix A (for example randomly)

2. Repeat

(a) Solve for B in the equation: AT AB = AT X
(b) Project solution onto non-negative matrix
subspace: set all negative values in B to ze-
ros
Figure 4: Portion of dendrogram depicting the clusters’ hier- (c) Solve for A in the equation: BBT AT =
archy of topics from the first 6 windows. Agglomeration is based BXT
on the dot product between the topics’ projections on a lower
(d) Project solution onto non-negative matrix
dimensional latent space extracted using NMF with kf = 30
subspace: set all negative values in A to ze-
factors. Average-Linkage Agglomerative Hierarchical Cluster-
ros
ing was used. Distance is computed as 1 minus similarity. Refer
to the electronic version of the paper for clarity. 3. Until Cost function decrease is below threshold

where kf is the approximated rank of matrices A 5.2 Topic Organization Stages: Topic Fea-
and B, and is selected such that kf < min(m, n), so ture Extraction, Latent Space Compu-
that the number of elements in the decomposition ma- tation using NMF, Latent Space-based
trices is far less than the number of elements of the Topic Similarity Computation, and Hier-
original matrix: nkf + kf m nm. archical Clustering
In the following, we summarize the steps that are ap-
Topics factor (A) can then be used to find the sim-
plied post-discovery of the topics, in order to generate
ilarity between the topics in the new latent space in-
a hierarchical organization from the sparse topics.
stead of using the original space of original terms. the
obtained similarity matrix from the NMF factors can 1. Preprocessing of the topic vectors: For each win-
finally be used to cluster the topics. dow, the topic-word matrix (Xn×m ) is extracted
from the final topic modeling results. The fea-
To find A and B, the Frobenius norm of errors be-
tures are the top words in a topic, and they are
tween the data and the approximation is optimized, as
binary (1 if a topic has the word in question and
follows
0 otherwise).
2. Latent Factor Discovery using NMF : The topic-
word data was normalized before running NMF.
The latter produces two factors (A and B), where
JN M F = ||E||2F = || X − ABT ||2F (3)
n, kf and m are the number of topics, latent fac-
tors and words, respectively. Our main goal was
to compute the Matrix A, also called the topics
basis factor, which transfers the topics to the la-
Several algorithms have been proposed in the liter- tent space. Choosing the number of factors), kf ,
ature to minimize this cost. We used an Alternating has an impact on the results,. After trial and er-
Least Square (ALS) method [PT94] that iteratively ror, we chose kf = 30.
solves for the factors, by assuming that the problem is
convex in either one of the factor matrices alone. 3. Generating the topic-similarity matrix in the la-
perplexity score indicating better generalization per-
formance.
For a test set of T documents D0 =
→(1) →(T )
w ,··· ,w and Nd being the total number
of keywords in dth document, the perplexity given in
Equation 4, will be lower for a better topic model.
Figure 5 shows the perplexity trends, suggesting that
more topics result in lower (thus better) perplexity.
Also, irrespective of the number of topics, AD-LDA-
Figure 5: Perplexity trends for each sliding window of based topic modeling can extract topics of good qual-
width three for various numbers of extracted topics. ity.
tent space: The computed topic basis matrix (A)  
PT →(d) →−
was used to obtain the similarity in lieu of the 

 d=1 ln p w | α , β 


original topic vectors. The normalized inner prod- 0
perplexity (D ) = exp − PT
uct of the matrix A and its transpose was calcu- d=1 Nd

 

 
lated for this purpose. Normalization of the prod-
uct is equivalent to computing the Cosine simi- (4)
larity between topic pairs. The resulting matrix
contains the pairwise similarity between each pair 6.2 Sentiment Based Topic Modeling
of topics within the latent space. Table 2 shows a subset of topics, extracted from the
positive and negative sentiment groups of tweets, and
4. Hierarchical Clustering of the latent space- these tend to be more refined than the standard un-
projected topics based on the new pairwise similar- sentimental topics. From the initial window, 1000 top-
ity scores computed in Step 3 : we experimented ics were extracted in the same way as the Distributed
with several linkage strategies such as single and LDA, however topic modeling was preceded by a senti-
average linkage. The latter was chosen as optimal. ment classifier that classifies the tweets based on their
sentiment (positive or negative). Although positive
5.3 Automated Hashtag Annotation and negative topics still share a few keywords, they
are clearly divided by sentiment.
We have also experimented with a simple tag comple-
tion or prediction step prior to topic modeling. An- 6.3 Topic Clustering in the Latent Space
notation for a given tweet is determined by finding
the top frequent tags associated with the KLS 6 near- Figure 3 shows the topic clusters created using the
est neighboring tweets in the NMF-computed Latent latent space-projected features extracted using NMF.
Space to the given tweet. Once the tags are com- The clusters in Figure 3 seem to have better quality
pleted, they are used to enrich the tweets before topic compared to the clusters in Figure 2 because of the
modeling. Of course, only the tweets bag of word de- more accurate capture of pairwise similarities between
scriptions in a given window are used to compute the topics in the conceptual space. Figure 4 shows the
NMF for that window’s topic modeling. The annota- clustering of the top 10 topics for a series of 6 windows,
tion generally resulted in lower Perplexity of the ex- showing how the agglomeration can consolidate the
tracted topic models, as shown in Figure 6. topics discovered at different time slots, helping avoid
excessive fragmentation throughout the stream’s life.

6 Results 6.4 Automated Hashtag Annotation
6.1 Distributed LDA-based Topic Modeling Tweet data is very sparse and not every tweet has valu-
able tags. To overcome this weakness, we applied an
Figure 2 shows7 a sample of the topic clusters’ hi-
NMF-based automated tweet annotation before topic
erarchy extracted from the initial window and with-
modeling. Adding the predicted hashtags to the tweets
out NMF-based latent space projection of the top-
enhanced the topic modeling. The automated tag an-
ics. The clusters are of debatable quality. Per-
notation, described in Section 5.3, generally resulted
plexity is a common metric to evaluate language
in lower Perplexity of the extracted topic models, as
models [BL06][BNJ03]. It is monotonically decreas-
shown in Figure 6, suggesting that the auto-completed
ing in the likelihood of the test data, with a lower
tags did help complete some missing and valuable in-
6 we report results for K
LS = 5
formation in the sparse tweet data, thus helping the
7 Refer to the electronic version of the paper for clarity. topic modeling.
[BM10] David M Blei and Jon D McAuliffe. Su-
pervised topic models. arXiv preprint
arXiv:1003.0783, 2010.
[BNJ03] David M Blei, Andrew Y Ng, and Michael I
Jordan. Latent dirichlet allocation. the
Journal of machine Learning research,
3:993–1022, 2003.
Figure 6: Perplexity for different numbers of topics [CBGN12] Juan C Caicedo, Jaafar BenAbdallah,
and varying window length, showing improved results Fabio A González, and Olfa Nasraoui.
when NMF-based automated tweet annotation is per- Multimodal representation, indexing, au-
formed before topic modeling. tomated annotation and retrieval of image
7 Conclusion collections via non-negative matrix fac-
torization. Neurocomputing, 76(1):50–60,
Using Distributed LDA topic modeling, followed by
2012.
NMF and hierarchical clustering within the resulting
Latent Space (LS), helped organize the topics into less [HBB10] Matthew Hoffman, David M Blei, and
fragmented themes. Sentiment detection prior to topic Francis Bach. Online learning for latent
modeling and automated hashtag annotation helped dirichlet allocation. Advances in Neural
improve the learned topic models, while the agglom- Information Processing Systems, 23:856–
eration of topics across several time windows can link 864, 2010.
the topics discovered at different time windows. Our
focus was on the topic modeling and organization us- [HN12] Basheer Hawwash and Olfa Nasraoui.
ing the simplest (bag of words) features. Special- Stream-dashboard: a framework for min-
ized twitter feature extraction and selection methods, ing, tracking and validating clusters in
such as the ones surveyed and proposed by Aiello et a data stream. In Proceedings of the
al. [APM+ 13], have the potential to improve our re- 1st International Workshop on Big Data,
sults, a direction we will explore in the future. An- Streams and Heterogeneous Source Min-
other direction to explore is the news domain specific, ing: Algorithms, Systems, Programming
user-centered approach, discussed in [SNT+ 14] and a Models and Applications, pages 109–117.
more expanded use of automated annotation to sup- ACM, 2012.
port topic extraction and description. [LGD11] Chang-Hwan Lee, Fernando Gutierrez,
and Dejing Dou. Calculating feature
8 Acknowledgements weights in naive bayes with kullback-leibler
We would like to thank the organizers of the SNOW measure. In Data Mining (ICDM), 2011
2014 workshop, in particular the members of the So- IEEE 11th International Conference on,
cialSensor team for their leadership in all the phases pages 1146–1151. IEEE, 2011.
of the competition.
[LH09] Chenghua Lin and Yulan He. Joint senti-
ment/topic model for sentiment analysis.
References In Proceedings of the 18th ACM confer-
[AN12] Artur Abdullin and Olfa Nasraoui. Clus- ence on Information and knowledge man-
tering heterogeneous data sets. In Web agement, pages 375–384. ACM, 2009.
Congress (LA-WEB), 2012 Eighth Latin
American, pages 1–8. IEEE, 2012. [LHAY07] Yang Liu, Xiangji Huang, Aijun An, and
Xiaohui Yu. Arsa: a sentiment-aware
[APM+ 13] Luca Maria Aiello, Georgios Petkos, Car- model for predicting sales performance us-
los Martin, David Corney, Symeon Pa- ing blogs. In Proceedings of the 30th an-
padopoulos, Ryan Skraba, Ayse Goker, nual international ACM SIGIR conference
Ioannis Kompatsiaris, and Alejandro on Research and development in informa-
Jaimes. Sensing trending topics in twitter. tion retrieval, pages 607–614. ACM, 2007.
IEEE Transactions on Multimedia, 2013.
[LS99] Daniel D Lee and H Sebastian Seung.
[BL06] David Blei and John Lafferty. Correlated Learning the parts of objects by non-
topic models. Advances in neural informa- negative matrix factorization. Nature,
tion processing systems, 18:147, 2006. 401(6755):788–791, 1999.
[LZ08] Yue Lu and Chengxiang Zhai. Opinion Identifying and verifying news through so-
integration through semi-supervised topic cial media: Developing a user-centered
modeling. In Proceedings of the 17th in- tool for professional journalists. Digital
ternational conference on World wide web, Journalism, 2014.
pages 121–130. ACM, 2008.
[TM08] Ivan Titov and Ryan McDonald. Model-
[McC] Mallet: A machine learning for language ing online reviews with multi-grain topic
toolkit. http://www.cs.umass.edu/ mccal- models. In Proceedings of the 17th inter-
lum/mallet. national conference on World Wide Web,
pages 111–120. ACM, 2008.
[NASW09] David Newman, Arthur Asuncion,
Padhraic Smyth, and Max Welling. Dis- [WWC05] Janyce Wiebe, Theresa Wilson, and Claire
tributed algorithms for topic models. The Cardie. Annotating expressions of opin-
Journal of Machine Learning Research, ions and emotions in language. Language
10:1801–1828, 2009. resources and evaluation, 39(2-3):165–210,
[PCA14] Symeon Papadopoulos, David Corney, and 2005.
Luca Maria Aiello. Snow 2014 data chal-
[YMM09] Limin Yao, David Mimno, and Andrew
lenge: Assessing the performance of news
McCallum. Efficient methods for topic
topic detection methods in social media. In
model inference on streaming document
Proceedings of the SNOW 2014 Data Chal-
collections. In Proceedings of the 15th
lenge, 2014.
ACM SIGKDD international conference
[PT94] Pentti Paatero and Unto Tapper. Positive on Knowledge discovery and data mining,
matrix factorization: A non-negative fac- pages 937–946. ACM, 2009.
tor model with optimal utilization of error
[ZBG13] Ke Zhai and Jordan Boyd-Graber. On-
estimates of data values. Environmetrics,
line topic models with infinite vocabulary.
5(2):111–126, 1994.
In International Conference on Machine
[SNT+ 14] S Schifferes, N. Newman, N. Thurman, Learning, 2013.
D. Corney, A.S. Goker, and C Martin.