=Paper=
{{Paper
|id=Vol-1646/paper6
|storemode=property
|title=Topic Sentiment Joint Model with Word Embeddings
|pdfUrl=https://ceur-ws.org/Vol-1646/paper6.pdf
|volume=Vol-1646
|authors=Fu Xianghua,Wu Haiying,Cui Laizhong
|dblpUrl=https://dblp.org/rec/conf/pkdd/FuWC16
}}
==Topic Sentiment Joint Model with Word Embeddings==
Topic Sentiment Joint Model with Word Embeddings Xianghua Fu, Haiying Wu, Laizhong Cui College of Computer Science and Software Engineering, Shenzhen University, Shenzhen Guangdong 518060, China fuxh@szu.edu.cn, whywuhaiying@gmail.com, cuilz@szu.edu.cn Abstract. Topic sentiment joint model is an extended model which aims to deal with the problem of detecting sentiments and topics simultaneously from online reviews. Most of existing topic sentiment joint modeling algorithms infer result- ing distributions from the co-occurrence of words. But when the training corpus is short and small, the resulting distributions might be not very satisfying. In this paper, we propose a novel topic sentiment joint model with word embeddings (TSWE), which introduces word embeddings trained on external large corpus. Furthermore, we implement TSWE with Gibbs sampling algorithms. The exper- iment results on Chinese and English data sets show that TSWE achieves signif- icant performance in the task of detecting sentiments and topics simultaneously. 1 Introduction With the rapid development of e-commerce and social media, it is extremely urgent and valuable to automatically analyze the reviews to detect sentiments and topics simulta- neously. Great effort on new methodologies for detecting topics and sentiments simul- taneously has flourished in the recent years [1-5]. Several works extending probabilistic topic models[6,7] have been designed to tackle the problem of the joint extraction of sentiments and latent topics from docu- ments in the recent years [2, 3, 8]. The joint sentiment topic model (JST) [2] extends LDA to a four-layer model by adding an additional sentiment layer between the docu- ment and the topic layers. Topic sentiment mixture (TSM) [8] jointly models topics and sentiments in the corpus built on the basis of PLSI. These approaches infer sentiment and topic distributions from the co-occurrence of words within documents. However, when the training corpus is small or when the documents are short, the sentiment and topic distributions might be not very satisfactory. Additionally, most of recent works [2, 3, 9] try to incorporate some polarity lexicons into their models as the prior knowledge. However, these approaches still have their limitations, for example if the polarity lexicons are not rich, the improvement of the prior is very limited. As a result, we have to seek for other approaches. Most recently, word embeddings are gaining more and more attention, since they show very good performance in a broad range of natural language processing (NLP) tasks [10-12]. For example, [10] incorporates latent feature vector representations of In: P. Cellier, T. Charnois, A. Hotho, S. Matwin, M.-F. Moens, Y. Toussaint (Eds.): Proceedings of DMNLP, Workshop at ECML/PKDD, Riva del Garda, Italy, 2016. Copyright c by the paper’s authors. Copying only for private and academic purposes. 42 F. Xianghua, W. Haiying and C. Laizhong words to LDA model, and [11] employs latent topic models to assign topics for each word in the text corpus, and learns topical word embeddings (TWE). But these models only complete the task of mining topics. Little attention has been devoted to topic sen- timent model with word embeddings so far. In this paper, we propose a new topic sen- timent model which incorporates word embeddings. To the best of our knowledge, it is the first work to formulate topic sentiment model with word embeddings. In contrast with other topic sentiment modeling frameworks, our model is distin- guished from them as follows: (1) we incorporate word embeddings trained on very large corpora. It significantly improves the sentiment-topic-word mapping and extends semantic and syntactic information of words. (2) experiments are performed on four real online review data sets for two kinds of language (English and Chinese), which show that our model is used more extensive. (3) we also compare the performance on incorporating the sentiment polarity and without introducing sentiment polarity respec- tively to demonstrate that our new model is fully unsupervised. We find that our unsu- pervised model is highly portable to other domains for the sentiment classification task and achieves significant performance in the task of sentiment analysis, and extracting sentiment-specific topics. 2 Topic and Sentiment Model with Word Embeddings Fig. 1. Graphical representation of TSWE model 2.1 Topic and Sentiment Model with Word Embeddings In this section, we propose a novel topic sentiment model with word embeddings called TSWE, as shown in Fig. 1. TSWE is formed by taking the original topic sentiment model JST [2, 3] and replacing their Dirichlet multinomial component with a two com- ponents mixture of a sentiment-topic-to-word Dirichlet multinomial component and a word embeddings component. Our model defines the probability that it generates a word from embeddings component as the multinomial distribution with: (1) Topic Sentiment Joint Model with Word Embeddings 43 The negative log likelihood according to our model factorizes topic-wise into fac- tors for each topic associated with sentiment. we derive: (2) Then we apply L-BFGS implementation [13] from the Mallet toolkit [14] to derive the topic vector that minimizes . 2.2 Generative process for the TSWE model The formal definition of the generative process of TSWE model is as follows: For each of sentiment-topic pair ( , ) generate the word distribution of the sentiment-topic pair ~ For each document draw a multinomial distribution ~ For each sentiment label under document draw a multinomial distribution ~ For each word in document -draw a sentiment label ~ -draw a topic ~ -draw a binary indicator variable ~ -draw a word ~ 2.3 Gibbs sampling for TSWE model In this section, we introduce the Gibbs sampling algorithm [15] for the TSWE model.The detailed derivation process on Gibbs Sampling for topic models can refer the literature [16]. The Posterior probability can be obtained from the joint probability as follows: (3) Samples derived from the Markov chain are then used to estimate , and as de- picted in equation (4), (5), (6). (4) (5) (6) 44 F. Xianghua, W. Haiying and C. Laizhong 3 Experiment In this section, we explore the performance of TSWE model on document-level senti- ment classification and topic extraction evaluations on different kinds of datasets for English and Chinese. 3.1 Experimental setup 3.1.1 Training word embeddings We train 300 dimensional word embeddings on two corpus by using the Google word2vec toolkit [17]: Chinese Wikipedia1 and English Wikipedia2. 3.1.2 Experimental datasets We perform experiments on two kinds of sentiment mining datasets, Chinese and Eng- lish. Chinese datasets consists of three categories of product reviews datasets3 including book, hotel, and computer, with 1000 positive and 1000 negative examples for each domain. English corpora is the polarity dataset version 2.04 which is introduced by Pang and Lee in 2004, consisting of 1000 positive and 1000 negative movie reviews, which we call MR04 dataset. Preprocessing: We remove the repetitive comments and stop words, the words that word frequencies are less than 2 or larger than 15 and the words that are not found in Google embeddings representations trained from Chinese Wikipedia corpus and Eng- lish Wikipedia corpus. In addition, we perform word segment for Chinese datasets 3.2 Parameter Setting We set the symmetric prior hyper-parameter =0.01 in our TSWE model. The sym- metric hyper-parameter is set to , where is the average document length and is total number of sentiment labels, as noted by [3]. The is set to the standard set- ting . 3.3 Experimental Results and Analysis In this section, we present and discuss the experimental results of both document-level sentiment classification and topic extraction. 3.3.1 Sentiment classification evaluation We use the common metrics to evaluate classification performance: Accuracy. Table 1 presents classification accuracy results obtained by TSWE on the computer data set with the number of topics set to either 1 or 20. By varying , as shown in Table 1, the TSWE model obtains its best result at =0.1, where the is set 0.1 to 0.5 is better 1 http://download.wikipedia.com/zhwiki/latest/zhwiki-latest-pages-articles.xml.bz2 2 http://nlp.stanford.edu/data/WestburyLab.wikicorp.201004.txt.bz2 3 http://www.datatang.com/data/11937 4 www.cs.cornell.edu/people/pabo/movie-review-data/ Topic Sentiment Joint Model with Word Embeddings 45 than =0.0 on computer data sets. That shows the word embeddings is effective in cap- turing positive and negative sentiments. So we fix at 0.1, and report experimental results based on this value for the rest of this section. Table 1. Accuracy on the computer and MR04 . accuracy data 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 K=1 0.765 0.791 0.786 0.791 0.781 0.776 0.726 0.689 0.653 0.653 0.561 computer K=20 0.781 0.797 0.788 0.782 0.791 0.786 0.791 0.772 0.745 0.602 0.552 computer MR04 TSWE+P JST TSWE-P TSWE+P JST TSWE-P 0.81 0.9 0.78 0.8 0.75 0.7 0.72 0.6 0.69 0.5 0.66 0.4 1 5 10 20 40 60 80 100 1 5 10 20 40 60 80 100 k k (a) (b) hotel book TSWE+P JST TSWE-P TSWE+P JST TSWE-P 0.8 0.85 0.76 0.75 0.72 0.65 0.68 0.55 0.64 0.45 0.6 0.35 1 5 10 20 40 60 80 100 1 5 10 20 40 60 80 100 k k (c) (d) Fig. 2. Accuracy with different topic number settings on the four datasets. With lexicons vs no lexicons: In the experiments, we compare the classification results of introducing lexicon and no lexicon, as shown in Fig.2, TSWE+P represents the accuracy of incorporating sen- timent prior, TSWE-P denotes sentiment prior is not introduced. The lexicon includes two subjectivity lexicons, the English lexicon is the MPQA5 and the Chinese lexicon is Hownet emotional word set6. On most tests, the classification results of incorporating lexicon are almost similar to the the classification results of no lexicon on the same topic number. That shows the word embeddings have already captured positive and negative sentiments. TSWE vs JST with different number of topics: 5 http://www.cs.pitt.edu/mpqa/ 6 http://www.datatang.com/datares/go.aspx?dataid=603399 46 F. Xianghua, W. Haiying and C. Laizhong Fig. 2 shows classification results produced by TSWE and the JST models on the four datasets with different numbers of topics. TSWE significantly outperforms JST in all of the datasets, particularly on the MR04 dataset where we get 20.0% improvement on accuracy at K = 80. The above results show that the word embeddings can help to extend the semantic information of words, and also can capture the sentiment infor- mation of words. 3.3.2 Topic extraction evaluation. The other goal of evaluation task is to extract topics and evaluate the effectiveness of sentiment topic. First we need to evaluate the topic clustering performance under the corresponding sentiment polarity. We use two common metrics to evaluate the perfor- mance: perplexity and normalized mutual information(NMI)[18]. More formally, for a test set of documents, the perplexity is: computer hotel MR04 book 2400 1600 perplexity 800 0 1 5 10 20 40 60 80 100 K Fig. 3. Perplexity in TSWE model with different topic number settings on the four data sets Fig. 3 shows that the perplexity on the MR04 dataset is higher than the other datasets. The reason is that the word in the corpus is more than others. From Table 2 we can learn that the TSWE model has better NMI than JST, the NMI for TSWE model is around 0.268~0.600, and the JST obtains only around 0.10~0.420, which shows the effectiveness of the topic cluster under the sentiment with TSWE. Table 2. NMI results in TSWE and JST on data sets book and MR04. NMI Data Model K=1 K=5 K=10 K=20 K=40 K=60 K=80 K=100 TSWE 0.392 0.353 0.309 0.338 0.293 0.302 0.315 0.268 book JST 0.260 0.083 0.070 0.195 0.270 0.062 0.24 0.168 TSWE 0.542 0.600 0.572 0.554 0.507 0.550 0.472 0.540 MR04 JST 0.358 0.420 0.248 0.370 0.195 0.101 0.100 0.164 A topic is a multinomial distribution over words conditioned on both topics and sen- timents. The most probable words for each sentiment-topic distribution could approxi- mately reflect the meaning of the topic. Table 3 shows the selected examples of global Topic Sentiment Joint Model with Word Embeddings 47 topics extracted from computer data set with JST and TSWE. Each row shows the top 15 words for corresponding topics. We can see that some words of TSWE such as “cooling, fan, radiator, voice, temperature, workmanship, operation” are about the com- puter Heat-dissipation problem, and some words such as “good, quietness, perfect, like, nice, suitable” are the emotional tendencies of the computer Heat-dissipation problem. It shows that TSWE can extract topic and sentiment simultaneously. Overall, the above analysis illustrates the effectiveness of TSWE in extracting opinionated topics under sentiment from a corpus. Table 3. extracted topic under different sentiment labels by JST and TSWE 漂亮/nice ;; 散热/cooling;; 外观/appearance;; 喜欢/like;; 设计/design;; Pos 配置/configuration;; 比较 /very;; 时尚/fashion ;; 硬盘/hard disk;; 噪音/noise;; JST 内存/memory;; 本本 /machine;; 完美 /perfect;; 钢琴/piano;; 键盘/keyboard 声音/voice;;风扇/fan;;温度/temperature;;发热量/calorific value;;散热/cooling;; Neg 硬盘/hard disk;; 接受/accept;; 开机/starting up;; 噪音/ noise;; 发热/heat;; 感觉/feeling;; 确实/indeed;; 运行 /operation;; 控制/contro;; 触摸/touch 散热/cooling;; 风扇/fan;; 不错/good;; 声音/voice;; 安静/quietness;; Pos 温度/temperature;;完美/perfect;;散热器/radiator;;喜欢/like;;做工/workmanship;; TSWE 漂亮/nice;; 运行/operation;; 游戏/game;; 合适/suitable;; 效果/effect 散热/cooling;; 风扇/fan;; 声音/ voice;; 温度/ temperature;; 一般/general;; Neg 不好/bad;;噪音/noise;;散热器/ radiator;;发热量/calorific value;;机器/machine;; 运行/operation;; 发热/heat;; 游戏/game;; 硬盘/hard disk;; 效果/effect 4 Conclusions and Future Work In this paper, we propose a novel unsupervised generative model (TSWE) for jointly mining sentiments, sentiment-specific topics from online reviews. To the best of our knowledge, this is the first work to model topic sentiment joint model with word em- beddings. Most importantly, the experiments on real review data sets for English and Chinese show that TSWE is effective in discovering sentiments and topics simultane- ously. In the future work, we will explore how to properly introduce the lexicon with HowNet lexicon to improve the performance of detecting sentiments and sentiment- specific topics. Acknowledgements. This research is supported by the National Nature Science Foundation of China under Grants 61472258, 61402294, National Key Technology Research and Development Program of the Ministry of Science and Technology of China (2014BAH28F05), Sci- ence and Technology Foundation of Shenzhen City under Grants JCYJ20140509172609162 and JCYJ20130329102032059. 48 F. Xianghua, W. Haiying and C. Laizhong 5 References 1. Dermouche, M., Kouas, L., Velcin, J., Loudcher, S.: A Joint Model for Topic-Senti ment Mod- eling from Text. In: ACM/SIGAPP Symposium On Applied Computing (SAC). (2015) 2. Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM conference on Information and knowledge management, pp. 375-384. ACM, (2009) 3. Lin, C., He, Y., Everson, R., Rüger, S.: Weakly supervised joint sentiment-topic detec tion from text. Knowledge and Data Engineering, IEEE Transactions on 24, 1134-1145 (2012) 4. Pavitra, R., Kalaivaani, P.: Weakly supervised sentiment analysis using joint sentiment topic detection with bigrams. In: Electronics and Communication Systems (ICECS), 2015 2nd In- ternational Conference on, pp. 889-893. IEEE, (2015) 5. Brody, S., Elhadad, N.: An unsupervised aspect-sentiment model for online reviews. In: Hu- man Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 804-812. Association for Computational Linguistics, (2010) 6. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. the Journal of machine Learning research 3, 993-1022 (2003) 7. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd an nual in- ternational ACM SIGIR conference on Research and development in information retrieval, pp. 50-57. ACM, (1999) 8. Mei, Q., Ling, X., Wondra, M., Su, H., Zhai, C.: Topic sentiment mixture: modeling facets and opinions in weblogs. In: Proceedings of the 16th international conference on World Wide Web, pp. 171-180. ACM, (2007) 9. Chen, Z., Li, C., Sun, J.-T., Zhang, J., Li, C., Zhang, J., Sun, J.-T., Chen, Z.: Sentiment Topic Model with Decomposed Prior. In: SDM, pp. 767-775. (2013) 10. Nguyen, D.Q., Billingsley, R., Du, L., Johnson, M.: Improving Topic Models with La tent Feature Word Representations. Transactions of the Association for Computational Linguis- tics 3, 299-313 (2015) 11. Liu, Y., Liu, Z., Chua, T.-S., Sun, M.: Topical Word Embeddings. In: AAAI, pp. 2418-2424. (2015) 12. Das, R., Zaheer, M., Dyer, C.: Gaussian LDA for topic models with word embeddings. In: Proceedings of the 53nd Annual Meeting of the Association for Computational Linguistics. (2015) 13. Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimiza tion. Mathematical programming 45, 503-528 (1989) 14. McCallum, A.K.: {MALLET: A Machine Learning for Language Toolkit}. (2002) 15. Walsh, B.: Markov chain monte carlo and gibbs sampling. (2004) 16. Heinrich, G.: Parameter estimation for text analysis. Technical report (2005) 17. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representa tions of words and phrases and their compositionality. In: Advances in neural information processing systems, pp. 3111-3119. (2013) 18. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval. Cambridge university press Cambridge (2008)