=Paper=
{{Paper
|id=Vol-2327/ExSS19
|storemode=property
|title=Interactive Topic Model with Enhanced Interpretability
|pdfUrl=https://ceur-ws.org/Vol-2327/IUI19WS-ExSS2019-2.pdf
|volume=Vol-2327
|authors=Jun Wang,Changsheng Zhao,Junfu Xiang,Kanji Uchino
|dblpUrl=https://dblp.org/rec/conf/iui/WangZXU19
}}
==Interactive Topic Model with Enhanced Interpretability==
Interactive Topic Model with Enhanced Interpretability Jun Wang Changsheng Zhao Fujitsu Laboratories of America Columbia University Sunnyvale, CA New York City, NY jun.wang@us.fujitsu.com cz2458@columbia.edu Junfu Xiang Kanji Uchino Fujitsu Nanda Software Tech. Co., Ltd. Fujitsu Laboratories of America Nanjing, China Sunnyvale, CA xiangjf.fnst@cn.fujitsu.com kanji@us.fujitsu.com ABSTRACT 1 INTRODUCTION Although existing interactive topic models allow untrained Topic models are a useful and ubiquitous tool for understand- end users to easily encode their feedback and iteratively re- ing large electronic archives, which can be used to discover fine the topic models, their unigram representations often the hidden themes that pervade the collection and annotate result in ambiguous description of topics and poor inter- the documents according to those themes, and further or- pretability for users. To address the problems, this paper pro- ganize, summarize, and search the texts [4]. However, as poses the first phrase-based interactive topic model which fully-unsupervised methods, vanilla topic models, such as can provide both high interpretability and high interactivity Latent Dirichlet allocation (LDA) [4], often generate some with human in the loop. First, we present an approach to topics which do not fully make sense to end users [10]. Some augment unigrams with a list of probable phrases which generated topics may not well correspond to meaningful offers a more intuitively interpretable and accurate topic de- concepts, for instances, two or more themes can be confused scription, and further efficiently encode users’ feedback with into one topic or two different topics can be (near) duplicates. phrase constraints in interactive processes of refining topic Some topics may not align well with user modeling goals models. Second, the proposed approach is demonstrated and or judgements. For many users in computational social sci- examined with real data. ence, digital humanities, and information studies, who are not machine learning experts, topic models are often a “take CCS CONCEPTS it or leave it” proposition [6, 10]. Different from purely unsu- • Human-centered computing → Human computer in- pervised topic models that often result in unexpected topics, teraction (HCI); • Computing methodologies → Ma- taking prior knowledge into account enables us to produce chine learning. more meaningful topics [20]. Interactive topic models with human in the loop are proposed and allow untrained end users to easily encode their feedback as prior knowledge KEYWORDS and iteratively refine the topic models (e.g., changing which Interactive Topic Model; Interpretability; Human in the loop words are included in a topic, or merging or splitting topics) [10, 12, 16]. ACM Reference Format: A topic is typically modeled as a categorical distribution Jun Wang, Changsheng Zhao, Junfu Xiang, and Kanji Uchino. 2019. over terms, and frequent terms related by a common theme Interactive Topic Model with Enhanced Interpretability. In Joint are expected to have a large probability [8]. It is of interest Proceedings of the ACM IUI 2019 Workshops, Los Angeles, USA, March to visualize these topics in order to facilitate human interpre- 20, 2019 , 7 pages. tation and exploration of the large amounts of unorganized text, and a list of most probable terms is often used to de- scribe individual topics. Similar to vanilla topic models, all existing interactive topic models are represented with un- IUI Workshops’19, March 20, 2019, Los Angeles, USA igrams, which often provide ambiguous representation of © Copyright 2019 for the individual papers by the papers’ authors. Copying the topic and poor interpretability for end users [8]. Smith permitted for private and academic purposes. This volume is published and et al. [16] conducted user studies on a unigram-based inter- copyrighted by its editors. active topic model, and also were aware of the requests from IUI Workshops’19, March 20, 2019, Los Angeles, USA J. Wang et al. Table 1: Real example of a topic discovered by the unigram- and Cannot-Links between words. Words with Must-Links based topic model and the phrase-based topic model, respec- are encouraged to have similar probabilities within all topics tively. while those with Cannot-Links are disallowed to simulta- neously have large probabilities within any topic. Xie et al. Topic: natural language processing unigrams phrases [20] studied how to incorporate the external word correla- model word embeddings tion knowledge to improve the coherence of topic modeling, language natural language and built a Markov Random Field (MRF) regularized topic word language model model encouraging words labeled as similar to share the text question answering same topic label. Yang et al. [21] integrated lexical associ- task machine translation ation into topic optimization using tree priors to improve question sentiment analysis sentence neural machine translation topic interpretability, which provided a flexible framework translation natural language processing that can take advantage of both first order word associa- neural text classification tions and the higher-order associations captured by word natural word representation embeddings. Several unigram-based interactive topic models have been proposed and studied. Hu et al. [10] extended the framework participants for the ability to add phrases and support of of Dirichlet Forest prior and proposed the first interactive multi-word refinements as opposed to single tokens. topic model. Lee et al. [12] employed a user-centered ap- As shown in Table 1, human interpretation often relies proach to identify a set of topic refinement operations that on inherent grouping of words into phrases, and augment- users expect to have in a interactive topic model system. ing unigrams with a list of probable phrases offers a more However, they did not implement underlying algorithm to intuitively interpretable and accurate topic description [8]. refine topic models and only used Wizard-of-Oz refinements: Under the ‘bag-of-words’ assumption of unigrams, phrases the resulting topics were updated superficially—not as the are decomposed and a phrase’s meaning may be lost, so topic output of a data-driven statistical model (the goal of topic models need to systematically assign topics to whole phrases. models) [16]. Smith et al. [16] further implemented an effi- Several phrase-based topic models [3, 7, 8, 19] have been pro- cient asymmetric prior-based interactive topic model with posed to discover topical phrases and address the prevalent a broader set of user-centered refinement operations, and deficiency in visualizing topics using unigrams. But all these conducted a study with twelve non-expert participants to models are static systems which end users cannot easily and examine how end users are affected by issues that arise with interactively refine, so they have the same “take it or leave a fully interactive, user-centered system. it” issues. Some researchers proposed various phrase-based topic To address the above problems, this paper proposes the models. Wang et al. [19] attempted to infer phrases and first phrase-based interactive topic model which can provide topics simultaneously by creating complex generative mech- both high interpretability and high interactivity as shown in anism. The resultant models can directly output phrases Figure 1. First, we present an approach to discover topical and their latent topic assignment. It used additional latent phrases with mixed lengths by detecting phrases and phrase- variables and word-specific multinomials to model bi-grams, based topic inference, and further efficiently encode users’ and these bigrams can be combined to form n-gram phrases. feedback with phrase constraints into interactive processes KERT [7] and Turbo Topics [3] constructed topical phrases as of refining topic models. Second, the proposed approach is a post-processing step to unigram-based topic models. These demonstrated and examined with real data. methods generally produce low-quality topical phrases or We organize the remainder of the paper as follows. Sec- suffer from poor scalability outside small datasets [8]. El- tion 2 introduces some related work. Section 3 illustrates Kishky et al. [8] and Wang et al. [18] proposed a computa- the general framework we propose. Section 4 presents our tionally efficient and effective approaches, which combines experimental results on real-world data. Finally, section 5 a phrase mining framework to segment a document into sin- summarizes our work and discuss the future work. gle and multi-word phrases, and a topic model with phrase constraints that operates on the induced document partition. 2 RELATED WORK Various approaches have been proposed to encode users’ 3 FRAMEWORK feedback as prior knowledge into topic models instead of For phrase-based topic models, the better method is first min- purely relying on how often words co-occur in different con- ing phrases and segmenting each document into single and texts. Andrzejewski et al. [1] imposed Dirichlet Forest prior multiword phrases, and then running topic inference with over the topic-word categoricals to encode the Must-Links phrase constraints [8]. End users can give feedback using a Interactive Topic Model with Enhanced Interpretability IUI Workshops’19, March 20, 2019, Los Angeles, USA based on a generative process almost same as LDA but with high Unigram-based Phrase-based constraints on topics of phrases, and corresponding phrase- interactive interactive based topic inference can be smoothly updated from unigram- Interactivity topic models topic model based topic inference of LDA. LDA assumes that a document may contain multiple topics, (Unigram-based) vanilla topic Phrase-based where a topic is defined to be a categorical distribution over Low topic models models words in the vocabulary. The generative process is as follows: Low high (1) Draw ϕ k ∼ Dirichlet(β), for 1 ≤ k ≤ K Interpretability (2) For document d, where 1 ≤ d ≤ D: (a) Draw θ k ∼ Dirichlet(α) Figure 1: Comparison of our phrase-based interactive topic (b) For n-th word in document d, where 1 ≤ n ≤ Nd models with other related topic models. (i) Draw zd ,n ∼ Cateдorical(θd ) (ii) Draw wd ,n ∼ Cateдorical(ϕ zd ,n ) Refinement α is a K-dimensional vector (α 1, . . . , α K ), and β is a V - Topical Corpus phrase Operations of dimensional vector (β 1, . . . , βV ). K is the number of topics, Visualization Users D is the number of documents, V is the size of vocabulary, Update Prior and Nd is the number of words in the document d. Phrase-based Knowledge with Based on its generative process, the joint distribution of Phrase Mining Topic Inference phrase constraints LDA (1) can be represented as the product of two Dirichlet- Multinomial distributions (2). The Dirichlet-Multinomial ex- Figure 2: The framework of phrase-based interactive topic pressions (3) can be further simplified using the feature of model gamma function (represented by Γ) later. variety of refinement operations on topical phrase visualiza- p(W , Z ; α, β) (1) ∫ tion, and users’ feedback will update the prior knowledge = p(W , Z, Θ, Φ; α, β)dΘdΦ and the phrase-based topic inference will be rerun based on ∫ ∫ the updated prior. As shown in Figure 2, the process forms = p(Z, Θ; α)dΘ × p(W , Φ|Z ; β)dΦ a loop in which users can continuously and interactively update and refine the topic model with phrase constraints. = p(Z ; α) × p(W |Z ; β) Phrase Mining = Dir Mult(Z ; α) × Dir Mult(W |Z ; β) (2) D ÎK k =1 Γ(Nd ,k + α k ) Ö Phrase mining is a text mining technique that discovers se- mantically meaningful phrases from massive text. Recent ∝ ÍK d =1 Γ( k =1 (Nd ,k + α k )) data-driven approaches opt instead to make use of frequency K ÎV v=1 Γ(N k ,v + βv ) Ö statistics in the corpus to address both candidate generation × ÍV (3) and quality estimation [7, 13, 15, 18]. They do not rely on k=1 Γ( v=1 (N k ,v + βv )) complex linguistic feature generation, domain-specific rules or extensive labeling efforts. Instead, they rely on large cor- W is the collection of all words in D documents, and Z is pora containing hundreds of thousands of documents to help the collection of corresponding topics assigned to each word deliver superior performance several indicators, including in W . Θ is the collection of (θ 1, . . . , θ K ), and Φ is the collec- frequency, mutual information, branching entropy and com- tion of (ϕ 1, . . . , ϕ K ). Nd ,k is the number of words assigned to parison to super/sub-sequences, were proposed to extract topic k in the document d, and Nk ,v is the number of words n-grams that reliably indicate frequent, concise concepts with topic k and value v in the vocabulary. [7, 13, 15, 18]. Because the generative process of PhraseLDA is almost same as LDA except phrase constraints on topics, its joint Phrase-based topic inference distribution is same as LDA in the above (3). But PhraseLDA After inducing a partition on each document, we perform and LDA are different in calculating the full conditional dis- topic inference to associate the same topic to each word in a tribution (4), by which we can sample topics using Gibbs phrase and thus naturally to the phrase as a whole. El-Kishky sampling. And we know that the full conditional distribution et al. [8] proposed a probabilistic graphical model PhraseLDA (4) is proportional to the joint distribution (1). IUI Workshops’19, March 20, 2019, Los Angeles, USA J. Wang et al. Dirichlet Forest prior has been widely used to encode users’ feedback as prior knowledge in various interactive p(za,b = i |Z ¬a,b ,W ; α, β) (4) topic models [1, 10, 16]. This kind of priors attempted to = p(za,b = i |w a,b = j, Z ¬a,b ,W¬a,b ; α, β) enforce hard and topic-independent rules that similar words ∝ p(W , Z ; α, β) should have similar probabilities in all topics, which is ques- tionable in that two words with similar representativeness za,b is the topic assigned to the w a,b , which is the b-th of one topic are not necessarily of equal importance for an- unit in the document a. W¬a,b is the collection of all units other topic [20]. For example, in the fruit topic, the words except w a,b , and Z ¬a,b is the collection of corresponding apple and orange have similar representativeness, while in topics assignments. In LDA w a,b is the b-th word in the an IT company topic, apple has much higher importance document a, and in PhraseLDA w a,b is the b-th phrase in than orange. Dirichlet Forest prior is unable to differentiate the document a, and this difference results in different topic the subtleties of word sense across topics and would falsely inference processes. put irrelevant words into the same topic [20]. For instance, For PhraseLDA, we can simplify two Dirichlet-Multinomial since orange and Microsoft are both labeled as similar to expressions (3) to sample the topic of a phrase as follows, apple and are required to have similar probabilities in all and please see Appendix for detail derivations. topics as apple has, in the end, they will be unreasonably allocated to the same topic. p(za,b = i |Z ¬a,b ,W ; α, β) Wallach et al. [17] has found that an asymmetric Dirichlet l a,b (N ¬a,b + α + д − 1) × (N ¬a,b i,w a,b ,д + βw a,b ,д ) Ö a,i i prior has substantial advantages over a symmetric prior in ∝ ÍV ¬a,b (5) topic models, and to address the above problems, Smith et д−1+ v=1 (N i,v + βv ) д=1 al. [16] proposed an asymmetric prior which encodes users’ feedback through modifying the Dirichlet prior parameters la,b is the length of the b-th phrase in the document a, and ¬a,b for each document and each topic involved. Similar idea can w a,b,д is the д-th word in phrase w a,b . Na,i is the number be extended to address phrase constraints and applied to of words assigned to topic i in the document a after excluding ¬a,b phrase-based interactive model. In the previous section on the phrase w a,b , and Ni,v is the number of words with topic phrase-based topic inference, all documents share the same i and value v after excluding the phrase w a,b . α and all topics share the same β. Here, every document a α and β can be optimized using the method presented by and every topic i involved in refinement operations need Minka [14] for the phrase-based topic model before refine- corresponding separate α (a) and β (i) , respectively, and the ment operations of users. sampling equation (5) should be updated as follows: Refinement Operations of Users (a) l a,b (N ¬a,b + α a,i i ¬a,b + д − 1) × (Ni,w a,b ,д + βw(i)a,b ,д ) Smith et al. [16] identified a set of refinements that users Ö (6) expected to be able to use in a interactive topic model, and im- д − 1 + Vv=1 (Ni,v ¬a,b + βv(i) ) Í д=1 plemented seven refinements requested by users: add word, remove word, change word order, remove document, These priors α (a) and β (i) are sometimes called “pseudo- split topic, merge topic, and add to stop words. counts” [9], and interactive models can take advantage of Participants of the qualitative evaluation in [16] found them by creating pseudo-counts to encourage the changes change word order to be one of the least useful refinements, users want to see in the topic [16]. and as shown in Table 1, with the phrase representation of Remove document and Merge topic are straightfor- topics the phrase order does not have much influence on ward and almost same as the unigram-based updates pro- human interpretability. Add to stop words is easy, and we posed in [16]. just exclude the word w from the vocabulary and ensures that • Remove document: to remove the document a from the Gibbs sampler ignores all occurrences of w in the corpus. topic i, we invalidate the topic assignment for all words So we can skip detail discussions of these two operations in the document a and assign a very small prior α i(a) in the paper, and extend other operations based on phrases to the topic i in a. instead of words. • Merge topic: merging topics i 1 and i 2 means the model will have a combined topic that represents both i 1 and Update Prior Knowledge with phrase constraints i 2 . We assign i 1 to all words that were previously as- Adding a human in the loop requires the user to be able signed to i 2 , and reduce the number of topics. to inject their knowledge via feedback into the sampling For Remove phrase, Split topic and Add phrase, cor- equation to guide the algorithm to better topics [16]. responding updates are a bit more complicated since we Interactive Topic Model with Enhanced Interpretability IUI Workshops’19, March 20, 2019, Los Angeles, USA need to deal with phrase constraints. For a phrase p, lp is the more than 50 times of our system. We also tried to extend length of p and pд is д-th word in p where 1 ≤ д ≤ lp . the model presented in [21] to support phrases, and check if human interpretability of generated topic are improved. • Remove phrase: to remove the phrase p from topic Correlation scores based on phrase embedding vectors gen- i, we need to locate all occurrences of p assigned to erated by Fasttext [5] are calculated to build two-level tree topic i and invalidate their topic assignment. For topic (i) prior. The model attempts to encourage phrases close in em- i, very small prior βpд is assigned to each word pд bedding vector space to appear in the same topic, but we contained in p. found that it only performs slightly better on downstream • Split topic: to split topic i 1 the user provides a subset tasks, such as classification, and does not really help to en- of seed phrases, which need to be moved from the orig- hance human interpretability. The above observations led to inal topic i, to a new topic i 2 . We invalidate the original creating our current system. topic assignment of all seed phrase occurrences, in- Our phrase-based topic model before refinement opera- crease the number of topics, and assign large prior βp(iд2 ) tions was initialized with 2000 iterations using the optimized for each word pд contained in each seed phrase p for α (with mean 0.415) and β (with mean 0.015). Since this is the new topic i 2 . a one-time job, we can set an even larger iteration number. • Add phrase: to add the phrase p to topic i, we inval- The number of sampling iterations for updating and refining idate all occurrences of p from all other topics and model can be tuned according to latency acceptable for users encourage the Gibbs sampler to assign topic i for each (for example, less than 1 minute), and we set the number occurrence, and we increase the prior of each word as 400. Similar to [16], βp(i)д is set as 0.000001 for remove contained in p for topic i. phrase and split topic. Since this paper focuses on improving human interpretabil- 4 EXPERIMENTS ity of interactive topic models, and as we have known, the We deployed the phrase-based interactive topic models as automated methods of measuring topic quality in terms of a part of our corporate learning platform for data scientist coherence often do not correlate well with human judge- training programs, in which a database contains 19852 recent ment and interpretation, and in addition, these methods are machine learning related papers collected from ICML/NIPS/arXiv. generally only available for unigram-based models, so the For phrase mining, we used our own tool based on gen- experimental evaluation in this paper are mainly based on eralized suffix tree (GST) presented in [18], and segmented user studies. 5 participants with computer science or elec- the titles and abstracts of all papers into a collections of tronic engineering background, who are users of the cor- meaningful phrases. porate learning platform, were asked to use and refine the In order to facilitate human exploration and interpretation, phrase-based interactive topic model. we visualize these papers into 20 topics using our system, Our user studies showed the split topic and remove and learners can further interactively refine the topics using phrase are the most commonly used operations, and oc- their domain knowledge as shown in Figure 3. A list of topics casionally merge topic is used based on users’ personal in the left panel are represented by top three phrases of each preference. But add phrase is a relatively rare operation, topic. Selecting a topic displays more detail in the right panel: because in most cases it is not easy for users to discover or the top 30 phrases with frequency and top associated docu- remember phrases not presented to them, especially for a ments with corresponding percentage. Users can click and new domain. select phrases for removing with remove phrase button or There are a couple of coherent but non-informative top- for splitting with split topic button, click and select docu- ics. For example, one topic mainly contains phrases such as ments for removing with remove document button, add training data, data sets, data points, and another topic mainly new phrases from the vocabulary with add phrase button, contains phrases such as experimental results, theoretical re- select phrases and click the add to stop words button to sults. Except these uninformative topics, all 5 participants move to the stop words list, or click merge topic button to agreed that our system can significantly refine quality and co- input two topics for merging. herence of all other topics and consistently improve human Before we implemented our phrase-based interactive model interpretability of topic modeling. The user studies showed illustrated in Figure 2, we first tried the model based on that a well-organized structure can be established and refined Dirichlet Forest prior presented in [10], and found a few by our phrase-based interactive topic model. drawbacks. Instead of direct modification, people are forced Several typical examples from participants’ real refine- to think of pairwise relation, which is counter-intuitive. Its ment operations are demonstrated here. In the first example prior tree structure is hard to encode phrase constraints and shown in Figure 4, a participant found that two unrelated results in an extremely slow convergence, whose latency is IUI Workshops’19, March 20, 2019, Los Angeles, USA J. Wang et al. Figure 3: UI of the phrase-based interactive topic model. Select a seed phrase topics (social media and Autonomous driving) were mistak- “social media” for split Social media Recommender systems enly mixed into one topic, and she selected Social media as topic operation User study Past decade a seed phrase for split topic. In the second example shown Social media Fake news Recommender systems User preferences in Figure 5, the existing topic was actually fine, but a par- Autonomous driving Differential privacy Autonomous vehicles Social sciences ticipant wanted to refine and separate a fine-grained new Traffic sign Differentially private Fake news Research topic topic on face recognition from the existing topic on image User study User preferences processing, and she selected Face recognition as a seed phrase Differential privacy Autonomous driving Autonomous vehicles Shed light for split topic. Interestingly, although only one seed phrase Social sciences Traffic sign Improve generalization was selected for the new topic in the above two examples, Differentially private Traffic light Specifically designed Aerial vehicles other unselected phrases related to the seed can correctly Autonomous cars Broad class move to the new topic as well. In the third example, a partic- Fully automatic Designed specifically ipant found that an important phrase Computer vision was assigned to a unexpected and inappropriate topic which is not really meaningful, and she wanted to remove Computer Figure 4: Real example of splitting two unrelated topics. vision from this inappropriate topic and check if it is possible to finally move it to a meaningful topic. After two rounds of remove phrase, the phrase Computer vision moved to an appropriate topic as shown in Figure 6. the latency of our system has significantly improved com- pared with previous systems based on tree prior, it can still be 5 CONCLUSION AND FUTURE WORK a major issue for large scale data, so we need to study more This paper proposes the first phrase-based interactive topic efficient methods of inference using sparsity [22], which can model which can provide both high interpretability and high be smoothly applied to systems with phrase constraints. Cur- interactivity with human in the loop, and demonstrates and rent methods for automatically measuring topic coherence examines the proposed approach with real data. Although and quality are also mainly for models based on unigrams Interactive Topic Model with Enhanced Interpretability IUI Workshops’19, March 20, 2019, Los Angeles, USA Select a seed phrase Face recognition preprint arXiv:1607.04606 (2016). “face recognition” for Style transfer split topic operation Face images [6] Jordan Boyd-Graber. 2017. Humans and Computers Working Together Facial expressions to Measure Machine Learning Interpretability. The Bridge 47 (2017), Image classification Facial landmark Input image Facial attributes 6–10. Face recognition Pattern recognition Face detection [7] Marina Danilevsky, Chi Wang, Nihit Desai, Xiang Ren, Jingyi Guo, Single image Style transfer Face verification and Jiawei Han. 2014. Automatic Construction and Ranking of Topi- Image captioning Referring expression Face images cal Keyphrases on Collections of Short Documents. In Proceedings of Image processing Image classification the 2014 SIAM International Conference on Data Mining, Philadelphia, Image retrieval Input image Natural images Single image Pennsylvania, USA, April 24-26, 2014. 398–406. Facial expressions Medical imaging [8] Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare R. Voss, and Jiawei Image quality Image captioning Compressed sensing Image processing Han. 2014. Scalable Topical Phrase Mining from Text Corpora. Proc. Image generation Generated images VLDB Endow. 8, 3 (Nov. 2014), 305–316. Natural images [9] Gregor Heinrich. 2004. Parameter estimation for text analysis. Technical MR images Report. [10] Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. Figure 5: Real example of separating a fine-grained new 2014. Interactive Topic Modeling. Machine Learning 95 (2014), 423– topic. 469. [11] Jey Han Lau, David Newman, and Timothy Baldwin. 2014. Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and phrase Computer vision phrase Computer vision Topic Model Quality. In EACL. The Association for Computer Linguis- phrase Computer vision in move to a new but still move to a new and tics, 530–539. a inappropriate topic A inappropriate topic B appropriate topic C “Medical [12] Tak Yeon Lee,image” Alisonrelated Smith, Kevin Seppi, Niklas Elmqvist, Jordan Boyd- Computer vision Computer vision Computer vision topic Address the problem Computational cost Medical imaging Graber, and Leah“Computational Findlater. 2017. The Human Touch: How Non-expert complexity” Large number Computationally efficient MR images related topic, butFix stillTopic undesired Benchmark datasets Computational complexity Breast cancer Users Perceive, Interpret, and Models. International Journal for “computer vision” Classification tasks Run time Image segmentation of Human-Computer Studies (2017). Challenging problem Orders of magnitude CT scans Classification problem Computation time CT images [13] Jialu Liu, Jingbo Shang, Chi Wang, Xiang Ren, and Jiawei Han. 2015. Small number Batch normalization Magnetic resonance Large margin Batch size imaging Mining Quality Phrases from Massive Text Corpora. In Proceedings Open problem Inference time Image registration of the 2015 ACM SIGMOD International Conference on Management of Data1st(SIGMOD round ’15). 1729–1744. Benchmark datasets Computational cost [14] Thomas movingP. Minka. 2000. Move Estimating a Dirichlet distribution. Technical from the previous Large number Computationally efficient Address the problem Computational complexity Report. MIT. topic to a new topic Challenging task Run time [15] J. Shang, J. Liu, M. Jiang, X. Ren, C. R. Voss, and J. Han. 2018. Automated Large scale Orders of magnitude Classification tasks Computation time 2nd round Phrase Mining from Massive Text Corpora. IEEE Transactions on Address this issue Batch normalization moving Challenging problem Batch size Knowledge and Data Engineering 30, 10 (2018), 1825–1837. Classification problem Inference time [16] Alison Smith, Varun Kumar, Undesired Jordan topic for Boyd-Graber, Kevin Seppi, and Small number Message passing Leah Findlater. 2018. “computer Closingvision” the Loop: User-Centered Design and Topic A after remove Topic B after remove phrase Computer vision phrase Computer vision Evaluation of a Human-in-the-Loop Topic Modeling System. In 23rd International Conference on Intelligent User Interfaces (IUI ’18). 293–304. [17] Hanna M. Wallach, David Mimno, and Andrew McCallum. 2009. Re- Figure 6: Real example of removing phrase from inappro- thinking LDA: Why Priors Matter. In Proceedings of the 22Nd Interna- priate topics after two rounds and moving to an appropriate tional Conference on Neural Information Processing Systems (NIPS’09). topic finally. 1973–1981. [18] Jun Wang, Junfu Xiang, and Kanji Uchino. 2015. Topic-Specific Recom- mendation for Open Education Resources. In Advances in Web-Based Learning – ICWL 2015, Frederick W.B. Li, Ralf Klamma, Mart Laanpere, [2, 11], so we also need to study how to extend corresponding Jun Zhang, Baltasar Fernández Manjón, and Rynson W.H. Lau (Eds.). methods for phrase-based models. Springer International Publishing, Cham, 71–81. [19] Xuerui Wang, Andrew McCallum, and Xing Wei. 2007. Topical N- REFERENCES Grams: Phrase and Topic Discovery, with an Application to Informa- [1] David Andrzejewski, Xiaojin Zhu, and Mark Craven. 2009. Incorpo- tion Retrieval. In Proceedings of the 2007 Seventh IEEE International rating Domain Knowledge into Topic Modeling via Dirichlet Forest Conference on Data Mining (ICDM ’07). 697–702. Priors. In Proceedings of the 26th Annual International Conference on [20] Pengtao Xie, Diyi Yang, and Eric P Xing. 2015. Incorporating Word Machine Learning (ICML ’09). 25–32. Correlation Knowledge into Topic Modeling. In The 2015 Conference [2] Shraey Bhatia, Jey Han Lau, and Timothy Baldwin. 2018. Topic Intru- of the North American Chapter of the Association for Computational sion for Automatic Topic Model Evaluation. In EMNLP. Association Linguistics. for Computational Linguistics, 844–849. [21] Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. 2017. Adapting [3] D. Blei and J. Lafferty. 2009. Visualizing Topics with Multi-Word Topic Models using Lexical Associations with Tree Priors. In Empirical Expressions. arXiv 0907.1013v1 (2009). Methods in Natural Language Processing. [4] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent [22] Yi Yang, Doug Downey, and Jordan Boyd-Graber. 2015. Efficient Meth- Dirichlet Allocation. J. Mach. Learn. Res. 3 (March 2003), 993–1022. ods for Incorporating Knowledge into Topic Models. In Empirical [5] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Methods in Natural Language Processing. 2016. Enriching Word Vectors with Subword Information. arXiv