Interactive Visualization for Topic Model Curation Guoray Cai Feng Sun Yongzhong Sha Penn State University Penn State University Lanzhou University University Park, PA, USA University Park, PA, USA Lanzhou, Gansu, China cai@ist.psu.edu fzs122@psu.edu shayzh@lzu.edu.cn ABSTRACT also help to identify the most relevant documents for a specific Understanding the content of a large text corpus can be assisted topic. Ideally, an analyst may be able to draw conclusions by topic modeling methods, but the discovered topics often from word distributions for topics and use such insight to con- do not make clear sense to human analysts. Interactive topic duct a more in-depth study on documents with high affinities modeling addresses such problems by allowing a human to for specific topics. steer the topic model curation process (generate, interpret, Despite such advances, topic models have not been widely diagnose, and refine). However, human have limited ability to adopted by data analysts for practical use of understanding work with the artifacts of computational topic models since large corpora [23]. Topics discovered by LDA and other al- they are difficult to interpret and harvest. This paper explores gorithms often have both “good” and “bad” topics judged by the nature of such challenges and provides a visual analytic users. Topics could be bad because (1) they often confuse solution in the context of supporting political scientists to two or more themes into one topic; (2) they often pick up two understand the thematic content of online petition data. We different topics that are (nearly) duplicates for human; and use interactive topic modeling of the White House online (3) nonsense topics [18], (4) topics with too many generic petition data as a lens to bring up key points of discussions words (e.g., “people, like, mr”) [5], (5) topics with disparate and to highlight the unsolved problems as well as potentials or poorly connected words [22], (6) topics misaligned with utilities of visual analytics methods. human interpretation [9], (7) irrelevant topics [27], (8) miss- ACM Classification Keywords ing associations between topics and documents [11], and (9) H.5.2 Information Interfaces and Presentation: User Interfaces: multiple similar topics [5]. The presence of poor-quality topics -visual analytics; H.4.2 Information Systems:: -visual analytic has been cited as the primary obstacle to the acceptance of systems statistical topic models outside of the machine learning com- munity [22]. The root of these problems lies in the fact that the Author Keywords objective function that topic models optimize does not always Topic models; Information visualization; visual analytics correlate well with human judgments of topic quality [7]. Due to these problems, the use of topic models to analyze domain- INTRODUCTION specific texts often requires manual validation of the latent Topic modeling has been advanced as a solution to the chal- topics to ensure that they are meaningful [16]. lenge of making sense of large corpora of textual data. With Addressing the above issues to make topic models usable by the help of machines, valuable themes buried in a large docu- analysts who are not machine learning experts, a variety of ment collection can emerge and provide a better representation human-in-the-loop methods have been proposed to allow an- of the documents. The most popular topic modeling tech- alysts to manipulate and incrementally refine a topic model niques, LDA (Latent Dirichlet Allocation) [4] and its variants, of a target text corpus [17, 18, 19, 2]. These methods typ- such as supervised LDA [26] and supervised anchor LDA [3], ically involve the use of interactive visualization and direct have been proven useful in many applications [29, 25], in- manipulation of topic models to diagnose poor topics and fix cluding online petition analysis [14]. Topic modeling assists them through operations such as adding or removing words in qualitative and quantitative research over user-generated texts topics, adjusting the weights of words within topics, splitting coming from the blogs or social media. By studying the set generic topics, and merging similar topics [17]. For example, of topics learned from social media conversations over some ITM [18] allows users to add, emphasize, and ignore words period of time, it may become possible to find out what users within topics, while UTOPIAN [8] allows users to adjust the are talking about, identify underlying topical trends, and fol- weights of words within topics, merge and split topics, and low them through time. Topic similarities among documents create new topics. Additionally, iVisClustering [19] lets users manually create or remove topics, merge or split topics, and reassign documents to another topic, with the help of visually exploring topic-document associations in a scatter plot. While these operations can be supported by direct manipu- lation and algorithmic extensions, it is more challenging to diagnose the quality concerns of machine-discovered topics, ©2018. Copyright for the individual papers remains with the authors. and in assessing if a refinement strategy results in topic im- Copying permitted for private and academic purposes. ESIDA ’18, March 11, 2018, Tokyo, Japan provement. This is where interactive visualization methods are As topic models treat documents as “bag-of-words”, the first most helpful. Topic Browser [6] uses a tabular visualization step of preparation before model training is tokenization, technique to assist assessing term orders within each topic, which splits each petition into a set of words. As words may and Termite [10] focuses on supporting effective evaluation of have various forms, lemmatization is then applied to transform term distributions associated with LDA topics through visual- them into a common base form. Compared with the stem- izations. TopicNets [13] used a web-based interactive visual ming technique that shares a similar goal, lemmatization takes interface to enables users to discover topics of increasing gran- advantage of vocabulary analysis and thus can produce the ularity through an informed selection of relevant subsets of dictionary form of words that users can interpret. Bigrams documents. are also used here for performance purpose [32]. Finally, stopwords are removed from the texts, as well as the overly While these visualization tools help users to assess and refine common terms that appear frequently (top 50), to avoid pos- static topic models, they run short in supporting the whole sible discrimination. The resulting corpus contains 11,189 topic curation process. Topic model curation goes beyond unique terms. human validation of machine-generated topics to include the whole human-directed process of discovering topics that are System Design useful specific to a domain of applications. For example, pub- Figure 1 shows the user interface of interacting with petition lic opinion researchers may be interested in discovering what documents and topic words. This system has two functional ar- is the range of policy preferences expressed in blog-spheres. eas. The lower part is a topic-word visualization that supports Crisis managers may be interested in conversations in social direct manipulation of words-to-topics correlation. media that are especially informative to their decisions on how to allocate resources and dispatch rescue teams. For such ap- plications, the use of topic models is not a one-shot process but is a broader process of seeking, assessing, relating, and struc- turing topics with the help of supervised and unsupervised topic models. A typical topic curation process starts with a vanilla topic model (purely unsupervised probabilistic model such as LDA), and let users conduct a full diagnostics to rec- ognize good and bad topics. Good topics will be collected and kept in a “bag”, while bad topics improved or removed. For the set of bad topics, users may explore multiple ways to ad- just topic models (merging/splitting topics, adding/removing words from a topic, modifying orders or weights of words in a topic). Depending on the consequence of imposed correla- tions and constraints, a new round of modeling and refinement can be initiated to explore the topic space of the document collection either in breadth or depth. Towards supporting topic curation, this paper focuses on un- derstanding the specific challenges of topic curation in the context of analyzing online petition data. We gained insight by actually practicing interactive topic modeling on the peti- tion data we collected from the White House online petition website “We the People”. This data set is considered a unique source for understanding citizens’ policy concerns and pref- erences [15]. The insight gained from this practice is used Figure 1: User interface for interactive topic modeling during to inform the design of a visual analytic system that supports exploration of petitions topic model diagnostics, refinement, and evaluation. We re- flect the use of visual analytic methods to enable users to The upper part is designed for exploring the topic quality from interactively curate topic models. the perspective of how the petitions (documents) are clus- tered according to the space defined by the topics. The points cloud map provides a visual overview of the petition space INTERACTIVE TOPIC MODELING OF PETITION DATA where topically similar petitions are positioned adjacently. It Electronic petitioning (e-petitioning) is becoming a prevalent is generated using t-SNE (t-distributed stochastic neighbor form of political action for enabling direct democratic engage- embedding) [8] to reduce the high-dimensional petition data ment [20]. The data used for this study comes from the online to a 2-D vector space that human can perceive easily. Due to petitioning platform “We the People”, hosted by the White its nature of being nondeterministic, t-SNE usually transforms House. It contains 5,177 petitions accumulated over the course a high-dimensional data point to a different 2-D vector. How- of six years (2011-2016). We further selected 4,095 petitions ever, the relationships between the data points will remain that are in English. Each petition has four fields: (1) a petition almost the same. An example of visualized petitions is shown ID, (2) a title, (3) a description, and (4) category tags. in the Figure 1. Each petition is assigned to one cluster based on its most operations can be achieved through a combination of above salient topic and is color-coded correspondingly. Users can basic operations. For example, investigating more fine-grained apply filters and highlighters on topics to manipulate the pe- topics can be accomplished by splitting topics iteratively. tition overview map. Highlighting enables users to review petitions in context while filtering allows users to focus on the Split a topic petitions of interest. When hovering over a document point, If a topic is considered be “bad” based on the observation that a pop-up window displays the title, body, and topics of the it confuses two or more meaningful topics into one topic, a document. In the meantime, the topic distribution (in terms solution could be to split the topic into two or more topics. To of weights) of the selected document is visualized as a bar do so, the user can check the topic he/she intends to split and chart. By clicking a topic label, its topic-words distribution is then click the “split” button. Before applying the operation, visualized as color-coded bars. the user is provided with the option to configure the number of resulting topics. Once confirming, the underlying model At the back-end of the system, we choose Correlation Explana- training will re-run under the new constraint that only the tion (CorEx) [30] as the topic modeling algorithm to perform selected topic is decomposed while the others remain the same interactive topic curation. Built on the theory of Correlation in terms of word allocation. Updated results will be generated Explanation [31] in information science, CorEx strives to rep- and visualized. resent the substrate information in a document collection that maximizes the informativeness of the data. Due to its fast train- In the backend, splitting a topic into n topics involves training ing time and capability of supporting anchoring, CorEx can a word2vec model to produce word embeddings [21]. The be easily tailored to incorporate human imposed correlations resulting model is used to calculate the semantical similarity or constraints for semi-supervised topic modeling, making it between words. After that, a similarity matrix of the words an ideal choice for supporting interactive topic modeling [12]. within this topic is produced, and spectral clustering is applied Using CorEx, users can anchor multiple words to one topic, to the matrix to categorize the words into n clusters. The anchor one word to multiple topics, or any other creative com- n clusters of words are encoded into the previous model as bination of anchors in order to discover topics that do not anchor words and will produce n new topics to replace the naturally emerge. By leveraging CorEx’s capability of topic original one. seeding through anchor words in our system, human analysts can incorporate their knowledge and insights into the process Merge topics by joining of refining topic models. If several topics are judged to have something common in their semantic meaning, they can be merged into one topic. This is TOPIC CURATION accomplished by selecting these topics and then clicking “ap- Using our system for topic curation involves three phased of ply” button. The system automatically apply the constraint that activities, with a number of iterations. words assigned to the topics to be merged have to appear in the resulting topic. underlying model will be updated. Accord- ingly, the visualization will be re-rendered. In the backend, Topic Discovery the words that appeared in the two topics are now anchored The first step is to use topic modeling algorithm with random under the same one. seeds to run an unsupervised discovery of topics. The user must specify how many topics is to be produced, with the Merge topics by absorption understanding that different numbers of topics can be chosen If one or more words in a topic are considered intruders and to analyze the petitions data on different levels of granularity fit better to a different topic, the user can re-allocate topic and it is likely to generate a different set of topics [14, 24]. words through drag-and-drop operations. Specifically, a user After initial unsupervised topic modeling with CorEx, users can select a word that is considered allocated incorrectly and assess the topic model and conduct diagnostic analysis on move it to a more related topic. After reallocation of words is topics. In particular, users will inspect topics, both individually done, the petition view will update to reflect the modification. and as a group, to evaluate their qualities by examining topic In the back end, Merging topics by absorbing is basically words. Those topics that users recognize as good ones should a reallocation process where selected words in one topic is be kept. For those bad ones, users can file complaints and anchored to the other one and a new model is trained. The come up with one or more strategies to address them. rest of the topic-word assignments remain the same through anchoring as well. Topic Refinement Topic refinement is achieved through manipulating topics- Evaluating Topics Interactively word representations at the bottom part of Figure 1. We in- Evaluating the quality of the topics in the current model is cluded an anchoring mechanism to be coupled with CorEx necessary for both the diagnoses of good/bad topics as well models. It allows users to anchor one or more words to one as assessing the impact of topic revisions. Evaluating topic topic, anchor one word to multiple topics, and anchor one or quality is done by assessing two aspects: (1) are the words in more words to some topics while not others. With this an- a topic coherent and contributing to some collective meaning? choring mechanism, topic revision interactions are supported (2) are the topics aligned with the information needs of the by operations such as splitting a topic, merging by joining, intended application? As such, we designed the interface and merging by absorbing (following [17]). More complicated in Figure 1 that visualizes topically represented petitions to support the following functions for evaluating the quality of Moving Intruder Words topics: By examining the above table, we find that topic 4 contains a word “health” that is clearly different from other words (see Inspecting quality of every single topic. Users can evaluate Figure 2). We also find that some petitions related to health but topics by looking at the coherence of the component words has nothing to do with “economy” are assigned to this topic and their relative weights (see the bars next to words) on a during the petition exploration phase. One example petition topic. Topics are also color-coded in the visualization window. is “place mental health as a required course in junior high and Clicking on the legend of a topic results in all the petitions with middle schools”. In order to correct this topic assignment, sufficient weights on that topic being highlighted (while other we performed topic refinement by moving the intruder word petitions are dimmed). These functions allow users to explore “health” from topic 4 to topic 0. The re-generated topic words the patterns of how petitions of the same topic clustered. A are shown in Table 1 as topic 0’ and topic 4’. good topic tends to create a cluster of petitions that are less mixed with petitions. Comparing topics. Users can evaluate one or more topics together by observing semantic relations to spatially close or remote topics, and by looking at the spatial relationships (over- lapping clusters, adjacent clusters, non-intersecting clusters) between petitions of the two topics. Applying filters to leave fewer topics on the figure helps reduce visual clutters. TOPIC MODEL CURATION SCENARIO We practiced topic curation process on the online petition dataset to experience how well our system supports topic di- agnostics and refinement. Firstly, we run the CorEx topic modeling and generated 20 topics. A fixed random seed was used to make sure the same results can be reproduced. Table 1 shows 5 samples out of 20 produced from a topic model. The (a) Original topic words initial result from the CorEx topic modeling reveals interesting (b) New topic words topic clusters from the data set. In the provided samples, topic 0 mainly talks about “disease”, topic 4 generally discusses Figure 2: Move topic word “health” from topic 4 to topic 0 “economy”, topic 5 describes “election”, and topic 16 repre- sents “law enforcement”. The bottom part of the table shows In order to assess if such a strategy of refining topics has the results after applying certain topic revision operations. led to a better outcome, we rendered the petition clusters in relation to the new topic definition and the result is shown Table 1: Selected topics (#topics = 20) in Figure 3. From this figure, we can clearly see how topic groups are isolated and cut. Compared with Figure 1, outliers id topic words (top 15) are nicely scattered apart and small clusters of outliers dis- 0 disease, patient, cancer, treatment, doctor, cure, disorder, medi- appear. Such result suggests that the change of topic model cation, pain, awareness, symptom, illness, medicine, diagnosis, by moving “health” from topic 4 to topic 0 is a good move. disability This claim is further confirmed by a calculated metric of topic 4 health, economy, tax, cost, benefit, increase, company, money, market, pay, healthcare, fund, research, dollar, debt coherence based on word context vectors [1]. This metric has 5 election, investigation, vote, voter, candidate, hillary_clinton, been demonstrated to have the highest correlation with the voting, campaign, department_justice, fbi, ballot, office, corrup- interpretability of topics [28]. The topic coherence of topic tion, violation, democrat 4 is increased from 0.453 to 0.555 after removing the word 6 internet, consumer, energy, information, technology, provider, intruder, and the overall topic coherence is increased from service, device, car, access, fuel, safety, standard, road, vehicle 16 officer, police, law_enforcement, evidence, police_officer, 0.431 to 0.443. county, aircraft, judge, governor_chris, killing, conviction, de- partment, scene, cat, chief Split a Multi-theme Topic 0’ health, treatment, disease, condition, patient, doctor, cancer, Observations show that the distribution of petitions of topic 6 awareness, pain, illness, medicine, disability, disorder, cure, is scattered in the reduced-dimensional space: there are sev- medication 4’ money, benefit, company, pay, economy, business, cost, fund, eral small clusters of petitions. By sampling some of them for tax, industry, dollar, budget, study, market, increase detailed inspection of petition contents, we found that some 6.1 service, information, com, access, standard, technology, inter- semantically irrelevant petitions are placed adjacently in the net, consumer, provider, content, http, privacy, https_facebook, visualization, e.g., “Prevent the FCC from ruining the Internet” internet_service, customer and “Put a fee on carbon-based fuels and return revenue to 6.2 safety, vehicle, energy, car, device, accident, fuel, road, aviation, forest, traffic, emission, faa, air, carbon households”, the former is about Internet and information tech- 5+16 investigation, vote, election, officer, police, law_enforcement, nology, while the latter is related to energy. This finding can campaign, candidate, corruption, voter also be validated by examining topic words of topic 6: “inter- net”, “information”, and “technology” are clearly incoherent (a) Petitions of topic 0 and topic 4 (a) Petitions of topic 6 (b) Petitions of topic 6 (6.1) and topic 7 (6.2) Figure 5: A comparison of visualized petitions before and after splitting topic 6 (b) Petitions of topic 0’ and topic 4’ Figure 3: A comparison of visualized petitions before and The modified version of the topic model is shown in Table 1 after moving words between topic 0 and topic 4 as 6-1 and 6-2 and Figure 4 as 6 and 7. The figure shows that the weights of the first several topic words are increased, indicating that these words can better represent the topics. It is with “energy”, “fuel”, and “safety”. Therefore, we believe also apparent from Figure 5 that the distributions of petitions topic 6 is of low quality since it contains several sub-topics for topic 6 and topic 7 become more focused, indicating that and needs to be diluted. the petitions documents within same clusters are more topi- cally homogeneous. After the new topic model applied, the above example petitions are allocated to the correct topics re- spectively, resulting in an increase of overall coherence value from 0.431 to 0.441. Specifically, the original topic 6 has an individual coherence score of 0.341, while the scores of newly produced topic 6 and topic 7 are 0.594 and 0.419 respectively. Merge Semantically Similar Topics If the number of topics is set to a large number, CorEX algo- rithm will generate topics in finer granularity of topics. This could create situations where words that contribute to a single theme end up in separate topics. Under such circumstance, a merging operation is necessary to make sure that petitions (a) Original topic words of similar topics are grouped together. In order to demon- strate this situation, we trained another topic model by setting the number of topics as 50 (relatively large) and the topic words are shown in Figure 6a. By looking at the topic words, (b) New topic words topic 1 and topic 7 both describe “healthcare” but appear to be different topics. Figure 4: Split topic 6 into two topics topic 6 (6.1) and topic 7 The topic words after merging these two topics are shown (6.2) in Figure 6b. Petitions of these two topics are now grouped into one cluster as well. Subsequently, these petitions can To address the quality concerns of topic 6, we split topic 6 into be processed and analyzed as a whole, e.g., summarized and two topics (by clicking on Topic 6 and choose "Split" button). forwarded to the Department of Health and Human Services. Topics that are difficult to interpret may still exist even after several iterations of topic refinements. On the other hand, some petitions are complicated in that they have multiple equally important aspects and even people have difficulty in identifying the most representative one. For those documents that are related to "bad" topics and can not be fixed at this round of analysis, the system can collect them into a subset of data to be fed into the next round of analysis. DISCUSSION Our work on analyzing the topic structures of online petitions is still a work-in-progress, but we have gained several lessons about interacting with topic modeling tools. First, users have (b) New topic words to deal with tremendous uncertainties when deciding what is the proper strategy in tuning the topic model. Visualizing the impact of multiple strategies and providing interaction capabilities to assess the quality of topics and compare the (a) Original topic words document clusters before and after the model tuning will be critically important. Figure 6: Merging topic 0 and topic 9 Another finding from this exercise is that there is a need to construct topic hierarchy from unsupervised topic models in order to be aligned with the way political scientists perceive the world of petition data. However, the topics discovered by CorEx algorithm have a flat structure, and they tend to be bi- ased towards those topic branches that have more detailed data. We will continue to explore our visual analytic approach for incremental refinement of topic structures and demonstrated how such an approach can be used to uncover topic hierar- chy of petitions that best reflects the human conception of the domain. Further work is required to evaluate the usability (a) Petitions of topic 0 and topic 9 and effectiveness of this method. While we used dimension reduction based visualization, other petition explorations and analysis approaches should be investigated as well. ACKNOWLEDGEMENT The authors would like to acknowledge funding support from National Science Foundation under award # IIS-1211059, and from a grant funded by the Chinese Natural Science Founda- tion under award 71373108. (b) Petitions of topic 9 (0 + 9) REFERENCES Figure 7: A comparison of visualized petitions before and 1. Nikolaos Aletras and Mark Stevenson. 2013. Evaluating after merging topic 9 Topic Coherence Using Distributional Semantics. In Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013). 13–22. Merging topics is also useful when a small number of topics is used. Referring to the before-mentioned topic model of 20 2. David Andrzejewski, Xiaojin Zhu, and Mark Craven. topics, we found that topic 5 contains words “investigation” 2009. Incorporating Domain Knowledge into Topic and “justice” that may be related to topic 16. Therefore, we Modeling via Dirichlet Forest Priors. In Proceedings of performed a merging by joining on these two topics and it the 26th Annual International Conference on Machine leads to a more general topic denoted as 5+16. Although the Learning (ICML ’09). ACM, New York, NY, USA, coherence value of merging the two topics remains almost 25–32. the same, it is noteworthy that a new word “corruption” is prioritized as it could serve as a bridge to connect two topics 3. Sanjeev Arora, Rong Ge, Yonatan Halpern, David represented as “election” and “law enforcement” (e.g., a peti- Mimno, Ankur Moitra, David Sontag, Yichen Wu, and tion titled “Arrest and prosecute officials who tried to suppress Michael Zhu. 2013. A practical algorithm for topic the vote in the 2012 election”), showing that merging topics modeling with provable guarantees. In International has the potential of revealing latent relationship among them. Conference on Machine Learning. 280–288. 4. David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. 16. David Hall, Daniel Jurafsky, and Christopher D. Manning. Latent dirichlet allocation. Journal of machine Learning 2008. Studying the history of ideas using topic models. In research 3, Jan (2003), 993–1022. EMNLP ’08 Proceedings of the Conference on Empirical 5. Jordan Boyd-Graber, David Mimno, and David Newman. Methods in Natural Language Processing. 363–371. 2014. Care and feeding of topic models: Problems, 17. Enamul Hoque and Giuseppe Carenini. 2016. Interactive diagnostics, and improvements. In Handbook of Mixed Topic Modeling for Exploring Asynchronous Online Membership Models and Its Applications. Chapman & Conversations. ACM Transactions on Interactive Hall, Chapter 12, 225 – 254. Intelligent Systems 6, 1 (feb 2016), 1–24. 6. Ajb Chaney and Dm Blei. 2012. Visualizing Topic 18. Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Models.. In Proceedings of the Sixth International AAAI Alison Smith. 2014. Interactive topic modeling. Machine Conference on Weblogs and Social Media. 419–422. Learning 95, 3 (2014), 423–469. 7. J. Chang, J. Boyd-Graber, C. Wang, S. Gerrish, and D. M. 19. Hanseung Lee, Jaeyeon Kihm, Jaegul Choo, John Stasko, Blei. 2009. Reading tea leaves: How humans interpret and Haesun Park. 2012. iVisClustering: An Interactive topic models. In Proceedings of Advances in Neural Visual Document Clustering via Topic Modeling. Information Processing Systems. 288–296. Computer Graphics Forum 31, 3pt3 (2012), 1155–1164. 8. Jaegul Choo, Changhyun Lee, Chandan K Reddy, and 20. Ralf Lindner and Ulrich Riehm. 2009. Electronic Haesun Park. 2013. UTOPIAN: User-Driven Topic petitions and institutional modernization. International Modeling Based on Interactive Nonnegative Matrix parliamentary e-petition systems in comparative Factorization. IEE Transactions of Visualization and perspective. JeDEM-eJournal of eDemocracy and Open Computer Graphics 19, 12 (2013), 1992–2001. Government 1, 1 (2009), 1–11. 9. Jason Chuang, Sonal Gupta, Christopher D Manning, and Jeffrey Heer. 2013. Topic Model Diagnostics: Assessing 21. Tomas Mikolov, Quoc V Le, and Ilya Sutskever. 2013. Domain Relevance via Topical Alignment. In Exploiting similarities among languages for machine Proceedings of the 30th International Conference on translation. arXiv preprint arXiv:1309.4168 (2013). Machine Learning. 612–620. 22. David Mimno, Hanna M. Wallach, Edmund Talley, 10. Jason Chuang, Christopher D. Manning, and Jeffrey Heer. Miriam Leenders, and Andrew McCallum. 2011. 2012. Termite : Visualization Techniques for Assessing Optimizing semantic coherence in topic models. Textual Topic Models. In Proceedings of the Proceedings of the 2011 Conference on Empirical International Working Conference on Advanced Visual Methods in Natural Language Processing 2 (2011), Interfaces - AVI ’12. 74. 262–272. 11. Hal Daumé. 2009. Markov random topic fields. 23. Sergey I. Nikolenko, Sergei Koltcov, and Olessia Proceedings of the ACL-IJCNLP 2009 Conference Short Koltsova. 2017. Topic modelling for qualitative studies. Papers August (2009), 293–296. Journal of Information Science 43, 1 (2017), 88–102. 12. Ryan J Gallagher, Kyle Reing, David Kale, and Greg Ver 24. Paul Hitlin. 2016. ‘We the People’: Five Years of Online Steeg. 2016. Anchored Correlation Explanation: Topic Petitions. Technical Report. Pew Research Center. Modeling with Minimal Domain Knowledge. arXiv 25. Daniel Ramage, Susan Dumais, and Dan Liebling. 2010. preprint arXiv:1611.10277 (2016). Characterizing Microblogs with Topic Models. 13. Brynjar Gretarsson, John O’Donovan, Svetlin Proceedings of the Fourth International AAAI Conference Bostandjiev, Tobias Höllerer, Arthur Asuncion, David on Weblogs and Social Media (2010), 1–8. Newman, and Padhraic Smyth. 2012. TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling. 26. Daniel Ramage, David Hall, Ramesh Nallapati, and ACM Trans. Intell. Syst. Technol. 3, 2, Article 23 (Feb. Christopher D Manning. 2009a. Labeled LDA: A 2012), 26 pages. supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 14. Loni Hagen, Ozlem Uzuner, Christopher Kotfila, Conference on Empirical Methods in Natural Language Teresa M. Harrison, and Dan Lamanna. 2015. Processing: Volume 1-Volume 1. Association for Understanding Citizens’ Direct Policy Suggestions to the Computational Linguistics, 248–256. Federal Government: A Natural Language Processing and Topic Modeling Approach. In 2015 48th Hawaii 27. Daniel Ramage, David Hall, Ramesh Nallapati, and International Conference on System Sciences, Vol. Christopher D Manning. 2009b. Labeled LDA: A 2015-March. IEEE, 2134–2143. supervised topic model for credit attribution in multi-labeled corpora. Proceedings of the 2009 15. Scott A Hale, Helen Margetts, and Taha Yasseri. 2013. Conference on Empirical Methods in Natural Language Petition growth and success rates on the UK No. 10 Processing August (2009), 248–256. Downing Street website. In Proceedings of the 5th annual ACM web science conference. ACM, 132–138. 28. Michael Röder, Andreas Both, and Alexander Hinneburg. Explanation. In Advances in Neural Information 2015. Exploring the Space of Topic Coherence Measures. Processing Systems, NIPS’14. In Proceedings of the Eighth ACM International 31. Greg Ver Steeg and Aram Galstyan. 2014. Discovering Conference on Web Search and Data Mining (WSDM structure in high-dimensional data through correlation ’15). ACM, New York, NY, USA, 399–408. explanation. In Advances in Neural Information 29. Amin Sorkhei, Kalle Ilves, and Dorota Glowacka. 2017. Processing Systems. 577–585. Exploring Scientific Literature Search Through Topic 32. Sida Wang and Christopher D Manning. 2012. Baselines Models. In Proceedings of the 2017 ACM Workshop on and bigrams: Simple, good sentiment and topic Exploratory Search and Interactive Data Analytics classification. In Proceedings of the 50th Annual Meeting (ESIDA ’17). ACM, 65–68. of the Association for Computational Linguistics: Short 30. Greg Ver Steeg and Aram Galstyan. 2014. Discovering Papers-Volume 2. Association for Computational Structure in High-Dimensional Data Through Correlation Linguistics, 90–94.