=Paper= {{Paper |id=Vol-2327/ExSS19 |storemode=property |title=Interactive Topic Model with Enhanced Interpretability |pdfUrl=https://ceur-ws.org/Vol-2327/IUI19WS-ExSS2019-2.pdf |volume=Vol-2327 |authors=Jun Wang,Changsheng Zhao,Junfu Xiang,Kanji Uchino |dblpUrl=https://dblp.org/rec/conf/iui/WangZXU19 }} ==Interactive Topic Model with Enhanced Interpretability== https://ceur-ws.org/Vol-2327/IUI19WS-ExSS2019-2.pdf
                    Interactive Topic Model with Enhanced
                                 Interpretability
                               Jun Wang                                                        Changsheng Zhao
                    Fujitsu Laboratories of America                                            Columbia University
                             Sunnyvale, CA                                                      New York City, NY
                       jun.wang@us.fujitsu.com                                                cz2458@columbia.edu

                             Junfu Xiang                                                          Kanji Uchino
                Fujitsu Nanda Software Tech. Co., Ltd.                                   Fujitsu Laboratories of America
                            Nanjing, China                                                        Sunnyvale, CA
                      xiangjf.fnst@cn.fujitsu.com                                              kanji@us.fujitsu.com

ABSTRACT                                                                     1   INTRODUCTION
Although existing interactive topic models allow untrained                   Topic models are a useful and ubiquitous tool for understand-
end users to easily encode their feedback and iteratively re-                ing large electronic archives, which can be used to discover
fine the topic models, their unigram representations often                   the hidden themes that pervade the collection and annotate
result in ambiguous description of topics and poor inter-                    the documents according to those themes, and further or-
pretability for users. To address the problems, this paper pro-              ganize, summarize, and search the texts [4]. However, as
poses the first phrase-based interactive topic model which                   fully-unsupervised methods, vanilla topic models, such as
can provide both high interpretability and high interactivity                Latent Dirichlet allocation (LDA) [4], often generate some
with human in the loop. First, we present an approach to                     topics which do not fully make sense to end users [10]. Some
augment unigrams with a list of probable phrases which                       generated topics may not well correspond to meaningful
offers a more intuitively interpretable and accurate topic de-               concepts, for instances, two or more themes can be confused
scription, and further efficiently encode users’ feedback with               into one topic or two different topics can be (near) duplicates.
phrase constraints in interactive processes of refining topic                Some topics may not align well with user modeling goals
models. Second, the proposed approach is demonstrated and                    or judgements. For many users in computational social sci-
examined with real data.                                                     ence, digital humanities, and information studies, who are
                                                                             not machine learning experts, topic models are often a “take
CCS CONCEPTS                                                                 it or leave it” proposition [6, 10]. Different from purely unsu-
• Human-centered computing → Human computer in-                              pervised topic models that often result in unexpected topics,
teraction (HCI); • Computing methodologies → Ma-                             taking prior knowledge into account enables us to produce
chine learning.                                                              more meaningful topics [20]. Interactive topic models with
                                                                             human in the loop are proposed and allow untrained end
                                                                             users to easily encode their feedback as prior knowledge
KEYWORDS
                                                                             and iteratively refine the topic models (e.g., changing which
Interactive Topic Model; Interpretability; Human in the loop                 words are included in a topic, or merging or splitting topics)
                                                                             [10, 12, 16].
ACM Reference Format:                                                           A topic is typically modeled as a categorical distribution
Jun Wang, Changsheng Zhao, Junfu Xiang, and Kanji Uchino. 2019.              over terms, and frequent terms related by a common theme
Interactive Topic Model with Enhanced Interpretability. In Joint             are expected to have a large probability [8]. It is of interest
Proceedings of the ACM IUI 2019 Workshops, Los Angeles, USA, March           to visualize these topics in order to facilitate human interpre-
20, 2019 , 7 pages.
                                                                             tation and exploration of the large amounts of unorganized
                                                                             text, and a list of most probable terms is often used to de-
                                                                             scribe individual topics. Similar to vanilla topic models, all
                                                                             existing interactive topic models are represented with un-
IUI Workshops’19, March 20, 2019, Los Angeles, USA                           igrams, which often provide ambiguous representation of
© Copyright 2019 for the individual papers by the papers’ authors. Copying   the topic and poor interpretability for end users [8]. Smith
permitted for private and academic purposes. This volume is published and    et al. [16] conducted user studies on a unigram-based inter-
copyrighted by its editors.
                                                                             active topic model, and also were aware of the requests from
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                                              J. Wang et al.

Table 1: Real example of a topic discovered by the unigram-      and Cannot-Links between words. Words with Must-Links
based topic model and the phrase-based topic model, respec-      are encouraged to have similar probabilities within all topics
tively.                                                          while those with Cannot-Links are disallowed to simulta-
                                                                 neously have large probabilities within any topic. Xie et al.
                 Topic: natural language processing
             unigrams      phrases
                                                                 [20] studied how to incorporate the external word correla-
             model         word embeddings                       tion knowledge to improve the coherence of topic modeling,
             language      natural language                      and built a Markov Random Field (MRF) regularized topic
             word          language model                        model encouraging words labeled as similar to share the
             text          question answering                    same topic label. Yang et al. [21] integrated lexical associ-
             task          machine translation
                                                                 ation into topic optimization using tree priors to improve
             question      sentiment analysis
             sentence      neural machine translation            topic interpretability, which provided a flexible framework
             translation natural language processing             that can take advantage of both first order word associa-
             neural        text classification                   tions and the higher-order associations captured by word
             natural       word representation                   embeddings.
                                                                    Several unigram-based interactive topic models have been
                                                                 proposed and studied. Hu et al. [10] extended the framework
participants for the ability to add phrases and support of       of Dirichlet Forest prior and proposed the first interactive
multi-word refinements as opposed to single tokens.              topic model. Lee et al. [12] employed a user-centered ap-
   As shown in Table 1, human interpretation often relies        proach to identify a set of topic refinement operations that
on inherent grouping of words into phrases, and augment-         users expect to have in a interactive topic model system.
ing unigrams with a list of probable phrases offers a more       However, they did not implement underlying algorithm to
intuitively interpretable and accurate topic description [8].    refine topic models and only used Wizard-of-Oz refinements:
Under the ‘bag-of-words’ assumption of unigrams, phrases         the resulting topics were updated superficially—not as the
are decomposed and a phrase’s meaning may be lost, so topic      output of a data-driven statistical model (the goal of topic
models need to systematically assign topics to whole phrases.    models) [16]. Smith et al. [16] further implemented an effi-
Several phrase-based topic models [3, 7, 8, 19] have been pro-   cient asymmetric prior-based interactive topic model with
posed to discover topical phrases and address the prevalent      a broader set of user-centered refinement operations, and
deficiency in visualizing topics using unigrams. But all these   conducted a study with twelve non-expert participants to
models are static systems which end users cannot easily and      examine how end users are affected by issues that arise with
interactively refine, so they have the same “take it or leave    a fully interactive, user-centered system.
it” issues.                                                         Some researchers proposed various phrase-based topic
   To address the above problems, this paper proposes the        models. Wang et al. [19] attempted to infer phrases and
first phrase-based interactive topic model which can provide     topics simultaneously by creating complex generative mech-
both high interpretability and high interactivity as shown in    anism. The resultant models can directly output phrases
Figure 1. First, we present an approach to discover topical      and their latent topic assignment. It used additional latent
phrases with mixed lengths by detecting phrases and phrase-      variables and word-specific multinomials to model bi-grams,
based topic inference, and further efficiently encode users’     and these bigrams can be combined to form n-gram phrases.
feedback with phrase constraints into interactive processes      KERT [7] and Turbo Topics [3] constructed topical phrases as
of refining topic models. Second, the proposed approach is       a post-processing step to unigram-based topic models. These
demonstrated and examined with real data.                        methods generally produce low-quality topical phrases or
   We organize the remainder of the paper as follows. Sec-       suffer from poor scalability outside small datasets [8]. El-
tion 2 introduces some related work. Section 3 illustrates       Kishky et al. [8] and Wang et al. [18] proposed a computa-
the general framework we propose. Section 4 presents our         tionally efficient and effective approaches, which combines
experimental results on real-world data. Finally, section 5      a phrase mining framework to segment a document into sin-
summarizes our work and discuss the future work.                 gle and multi-word phrases, and a topic model with phrase
                                                                 constraints that operates on the induced document partition.
2   RELATED WORK
Various approaches have been proposed to encode users’           3   FRAMEWORK
feedback as prior knowledge into topic models instead of         For phrase-based topic models, the better method is first min-
purely relying on how often words co-occur in different con-     ing phrases and segmenting each document into single and
texts. Andrzejewski et al. [1] imposed Dirichlet Forest prior    multiword phrases, and then running topic inference with
over the topic-word categoricals to encode the Must-Links        phrase constraints [8]. End users can give feedback using a
Interactive Topic Model with Enhanced Interpretability                                     IUI Workshops’19, March 20, 2019, Los Angeles, USA

                                                                                    based on a generative process almost same as LDA but with
                   high
                              Unigram-based         Phrase-based                    constraints on topics of phrases, and corresponding phrase-
                              interactive           interactive
                                                                                    based topic inference can be smoothly updated from unigram-
              Interactivity


                              topic models          topic model
                                                                                    based topic inference of LDA.
                                                                                      LDA assumes that a document may contain multiple topics,
                              (Unigram-based)
                              vanilla topic
                                                        Phrase-based                where a topic is defined to be a categorical distribution over
                   Low                                  topic models
                              models                                                words in the vocabulary. The generative process is as follows:
                                  Low                            high                 (1) Draw ϕ k ∼ Dirichlet(β), for 1 ≤ k ≤ K
                                         Interpretability
                                                                                      (2) For document d, where 1 ≤ d ≤ D:
                                                                                        (a) Draw θ k ∼ Dirichlet(α)
Figure 1: Comparison of our phrase-based interactive topic
                                                                                        (b) For n-th word in document d, where 1 ≤ n ≤ Nd
models with other related topic models.
                                                                                           (i) Draw zd ,n ∼ Cateдorical(θd )
                                                                                          (ii) Draw wd ,n ∼ Cateдorical(ϕ zd ,n )
                                                                    Refinement
                                                                                      α is a K-dimensional vector (α 1, . . . , α K ), and β is a V -
                                           Topical
           Corpus                          phrase                  Operations of    dimensional vector (β 1, . . . , βV ). K is the number of topics,
                                        Visualization                  Users
                                                                                    D is the number of documents, V is the size of vocabulary,
                                                                    Update Prior
                                                                                    and Nd is the number of words in the document d.
                                         Phrase-based              Knowledge with
                                                                                      Based on its generative process, the joint distribution of
         Phrase Mining
                                        Topic Inference                phrase
                                                                     constraints    LDA (1) can be represented as the product of two Dirichlet-
                                                                                    Multinomial distributions (2). The Dirichlet-Multinomial ex-
Figure 2: The framework of phrase-based interactive topic                           pressions (3) can be further simplified using the feature of
model                                                                               gamma function (represented by Γ) later.


variety of refinement operations on topical phrase visualiza-                                    p(W , Z ; α, β)                                       (1)
                                                                                                 ∫
tion, and users’ feedback will update the prior knowledge                                      =   p(W , Z, Θ, Φ; α, β)dΘdΦ
and the phrase-based topic inference will be rerun based on
                                                                                                 ∫                  ∫
the updated prior. As shown in Figure 2, the process forms
                                                                                               =   p(Z, Θ; α)dΘ × p(W , Φ|Z ; β)dΦ
a loop in which users can continuously and interactively
update and refine the topic model with phrase constraints.                                     = p(Z ; α) × p(W |Z ; β)
Phrase Mining                                                                                  = Dir Mult(Z ; α) × Dir Mult(W |Z ; β)                  (2)
                                                                                                   D ÎK
                                                                                                         k =1 Γ(Nd ,k + α k )
                                                                                                 Ö                            
Phrase mining is a text mining technique that discovers se-
mantically meaningful phrases from massive text. Recent                                        ∝         ÍK
                                                                                                  d =1 Γ( k =1 (Nd ,k + α k ))
data-driven approaches opt instead to make use of frequency
                                                                                                      K ÎV
                                                                                                            v=1 Γ(N k ,v + βv )
                                                                                                    Ö                           
statistics in the corpus to address both candidate generation
                                                                                                 ×          ÍV                                         (3)
and quality estimation [7, 13, 15, 18]. They do not rely on
                                                                                                     k=1 Γ( v=1 (N k ,v + βv ))
complex linguistic feature generation, domain-specific rules
or extensive labeling efforts. Instead, they rely on large cor-                        W is the collection of all words in D documents, and Z is
pora containing hundreds of thousands of documents to help                          the collection of corresponding topics assigned to each word
deliver superior performance several indicators, including                          in W . Θ is the collection of (θ 1, . . . , θ K ), and Φ is the collec-
frequency, mutual information, branching entropy and com-                           tion of (ϕ 1, . . . , ϕ K ). Nd ,k is the number of words assigned to
parison to super/sub-sequences, were proposed to extract                            topic k in the document d, and Nk ,v is the number of words
n-grams that reliably indicate frequent, concise concepts                           with topic k and value v in the vocabulary.
[7, 13, 15, 18].                                                                       Because the generative process of PhraseLDA is almost
                                                                                    same as LDA except phrase constraints on topics, its joint
Phrase-based topic inference                                                        distribution is same as LDA in the above (3). But PhraseLDA
After inducing a partition on each document, we perform                             and LDA are different in calculating the full conditional dis-
topic inference to associate the same topic to each word in a                       tribution (4), by which we can sample topics using Gibbs
phrase and thus naturally to the phrase as a whole. El-Kishky                       sampling. And we know that the full conditional distribution
et al. [8] proposed a probabilistic graphical model PhraseLDA                       (4) is proportional to the joint distribution (1).
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                                                                       J. Wang et al.

                                                                                   Dirichlet Forest prior has been widely used to encode
                                                                                users’ feedback as prior knowledge in various interactive
               p(za,b = i |Z ¬a,b ,W ; α, β)                              (4)   topic models [1, 10, 16]. This kind of priors attempted to
             = p(za,b = i |w a,b = j, Z ¬a,b ,W¬a,b ; α, β)                     enforce hard and topic-independent rules that similar words
             ∝ p(W , Z ; α, β)                                                  should have similar probabilities in all topics, which is ques-
                                                                                tionable in that two words with similar representativeness
  za,b is the topic assigned to the w a,b , which is the b-th                   of one topic are not necessarily of equal importance for an-
unit in the document a. W¬a,b is the collection of all units                    other topic [20]. For example, in the fruit topic, the words
except w a,b , and Z ¬a,b is the collection of corresponding                    apple and orange have similar representativeness, while in
topics assignments. In LDA w a,b is the b-th word in the                        an IT company topic, apple has much higher importance
document a, and in PhraseLDA w a,b is the b-th phrase in                        than orange. Dirichlet Forest prior is unable to differentiate
the document a, and this difference results in different topic                  the subtleties of word sense across topics and would falsely
inference processes.                                                            put irrelevant words into the same topic [20]. For instance,
   For PhraseLDA, we can simplify two Dirichlet-Multinomial                     since orange and Microsoft are both labeled as similar to
expressions (3) to sample the topic of a phrase as follows,                     apple and are required to have similar probabilities in all
and please see Appendix for detail derivations.                                 topics as apple has, in the end, they will be unreasonably
                                                                                allocated to the same topic.
       p(za,b = i |Z ¬a,b ,W ; α, β)                                               Wallach et al. [17] has found that an asymmetric Dirichlet
       l a,b  (N ¬a,b + α + д − 1) × (N ¬a,b
                                           i,w a,b ,д + βw a,b ,д )
       Ö        a,i       i
                                                                               prior has substantial advantages over a symmetric prior in
   ∝                             ÍV      ¬a,b
                                                                          (5)   topic models, and to address the above problems, Smith et
                       д−1+       v=1 (N i,v + βv )
                                                    
       д=1                                                                      al. [16] proposed an asymmetric prior which encodes users’
                                                                                feedback through modifying the Dirichlet prior parameters
   la,b is the length of the b-th phrase in the document a, and
                                              ¬a,b                              for each document and each topic involved. Similar idea can
w a,b,д is the д-th word in phrase w a,b . Na,i    is the number
                                                                                be extended to address phrase constraints and applied to
of words assigned to topic i in the document a after excluding
                          ¬a,b                                                  phrase-based interactive model. In the previous section on
the phrase w a,b , and Ni,v    is the number of words with topic                phrase-based topic inference, all documents share the same
i and value v after excluding the phrase w a,b .                                α and all topics share the same β. Here, every document a
   α and β can be optimized using the method presented by                       and every topic i involved in refinement operations need
Minka [14] for the phrase-based topic model before refine-                      corresponding separate α (a) and β (i) , respectively, and the
ment operations of users.                                                       sampling equation (5) should be updated as follows:
Refinement Operations of Users                                                                         (a)
                                                                                   l a,b  (N ¬a,b + α
                                                                                             a,i       i
                                                                                                                        ¬a,b
                                                                                                          + д − 1) × (Ni,w a,b ,д
                                                                                                                                  + βw(i)a,b ,д ) 
Smith et al. [16] identified a set of refinements that users
                                                                                   Ö
                                                                                                                                                      (6)
expected to be able to use in a interactive topic model, and im-                                    д − 1 + Vv=1 (Ni,v
                                                                                                                     ¬a,b
                                                                                                                          + βv(i) )
                                                                                                            Í                       
                                                                                   д=1
plemented seven refinements requested by users: add word,
remove word, change word order, remove document,                                  These priors α (a) and β (i) are sometimes called “pseudo-
split topic, merge topic, and add to stop words.                                counts” [9], and interactive models can take advantage of
   Participants of the qualitative evaluation in [16] found                     them by creating pseudo-counts to encourage the changes
change word order to be one of the least useful refinements,                    users want to see in the topic [16].
and as shown in Table 1, with the phrase representation of                        Remove document and Merge topic are straightfor-
topics the phrase order does not have much influence on                         ward and almost same as the unigram-based updates pro-
human interpretability. Add to stop words is easy, and we                       posed in [16].
just exclude the word w from the vocabulary and ensures that                        • Remove document: to remove the document a from
the Gibbs sampler ignores all occurrences of w in the corpus.                         topic i, we invalidate the topic assignment for all words
So we can skip detail discussions of these two operations                             in the document a and assign a very small prior α i(a)
in the paper, and extend other operations based on phrases                            to the topic i in a.
instead of words.                                                                   • Merge topic: merging topics i 1 and i 2 means the model
                                                                                      will have a combined topic that represents both i 1 and
Update Prior Knowledge with phrase constraints                                        i 2 . We assign i 1 to all words that were previously as-
Adding a human in the loop requires the user to be able                               signed to i 2 , and reduce the number of topics.
to inject their knowledge via feedback into the sampling                          For Remove phrase, Split topic and Add phrase, cor-
equation to guide the algorithm to better topics [16].                          responding updates are a bit more complicated since we
Interactive Topic Model with Enhanced Interpretability                     IUI Workshops’19, March 20, 2019, Los Angeles, USA

need to deal with phrase constraints. For a phrase p, lp is the      more than 50 times of our system. We also tried to extend
length of p and pд is д-th word in p where 1 ≤ д ≤ lp .              the model presented in [21] to support phrases, and check
                                                                     if human interpretability of generated topic are improved.
    • Remove phrase: to remove the phrase p from topic
                                                                     Correlation scores based on phrase embedding vectors gen-
       i, we need to locate all occurrences of p assigned to
                                                                     erated by Fasttext [5] are calculated to build two-level tree
       topic i and invalidate their topic assignment. For topic
                               (i)                                   prior. The model attempts to encourage phrases close in em-
       i, very small prior βpд is assigned to each word pд
                                                                     bedding vector space to appear in the same topic, but we
       contained in p.                                               found that it only performs slightly better on downstream
    • Split topic: to split topic i 1 the user provides a subset     tasks, such as classification, and does not really help to en-
       of seed phrases, which need to be moved from the orig-        hance human interpretability. The above observations led to
       inal topic i, to a new topic i 2 . We invalidate the original creating our current system.
       topic assignment of all seed phrase occurrences, in-             Our phrase-based topic model before refinement opera-
       crease the number of topics, and assign large prior βp(iд2 )  tions was initialized with 2000 iterations using the optimized
       for each word pд contained in each seed phrase p for          α (with mean 0.415) and β (with mean 0.015). Since this is
       the new topic i 2 .                                           a one-time job, we can set an even larger iteration number.
    • Add phrase: to add the phrase p to topic i, we inval-          The number of sampling iterations for updating and refining
       idate all occurrences of p from all other topics and          model can be tuned according to latency acceptable for users
       encourage the Gibbs sampler to assign topic i for each        (for example, less than 1 minute), and we set the number
       occurrence, and we increase the prior of each word            as 400. Similar to [16], βp(i)д is set as 0.000001 for remove
       contained in p for topic i.
                                                                     phrase and split topic.
                                                                        Since this paper focuses on improving human interpretabil-
4 EXPERIMENTS
                                                                     ity of interactive topic models, and as we have known, the
We deployed the phrase-based interactive topic models as             automated methods of measuring topic quality in terms of
a part of our corporate learning platform for data scientist         coherence often do not correlate well with human judge-
training programs, in which a database contains 19852 recent         ment and interpretation, and in addition, these methods are
machine learning related papers collected from ICML/NIPS/arXiv. generally only available for unigram-based models, so the
   For phrase mining, we used our own tool based on gen-             experimental evaluation in this paper are mainly based on
eralized suffix tree (GST) presented in [18], and segmented          user studies. 5 participants with computer science or elec-
the titles and abstracts of all papers into a collections of         tronic engineering background, who are users of the cor-
meaningful phrases.                                                  porate learning platform, were asked to use and refine the
   In order to facilitate human exploration and interpretation,      phrase-based interactive topic model.
we visualize these papers into 20 topics using our system,              Our user studies showed the split topic and remove
and learners can further interactively refine the topics using       phrase are the most commonly used operations, and oc-
their domain knowledge as shown in Figure 3. A list of topics        casionally merge topic is used based on users’ personal
in the left panel are represented by top three phrases of each       preference. But add phrase is a relatively rare operation,
topic. Selecting a topic displays more detail in the right panel:    because in most cases it is not easy for users to discover or
the top 30 phrases with frequency and top associated docu-           remember phrases not presented to them, especially for a
ments with corresponding percentage. Users can click and             new domain.
select phrases for removing with remove phrase button or                There are a couple of coherent but non-informative top-
for splitting with split topic button, click and select docu-        ics. For example, one topic mainly contains phrases such as
ments for removing with remove document button, add                  training data, data sets, data points, and another topic mainly
new phrases from the vocabulary with add phrase button,              contains phrases such as experimental results, theoretical re-
select phrases and click the add to stop words button to             sults. Except these uninformative topics, all 5 participants
move to the stop words list, or click merge topic button to          agreed that our system can significantly refine quality and co-
input two topics for merging.                                        herence of all other topics and consistently improve human
   Before we implemented our phrase-based interactive model          interpretability of topic modeling. The user studies showed
illustrated in Figure 2, we first tried the model based on           that a well-organized structure can be established and refined
Dirichlet Forest prior presented in [10], and found a few            by our phrase-based interactive topic model.
drawbacks. Instead of direct modification, people are forced            Several typical examples from participants’ real refine-
to think of pairwise relation, which is counter-intuitive. Its       ment operations are demonstrated here. In the first example
prior tree structure is hard to encode phrase constraints and        shown in Figure 4, a participant found that two unrelated
results in an extremely slow convergence, whose latency is
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                                                                   J. Wang et al.




                                    Figure 3: UI of the phrase-based interactive topic model.


                                                                                 Select a seed phrase
topics (social media and Autonomous driving) were mistak-                        “social media” for split
                                                                                                            Social media
                                                                                                            Recommender systems
enly mixed into one topic, and she selected Social media as                      topic operation            User study
                                                                                                            Past decade
a seed phrase for split topic. In the second example shown                       Social media               Fake news
                                                                                 Recommender systems        User preferences
in Figure 5, the existing topic was actually fine, but a par-                    Autonomous driving         Differential privacy
                                                                                 Autonomous vehicles        Social sciences
ticipant wanted to refine and separate a fine-grained new                        Traffic sign               Differentially private
                                                                                 Fake news                  Research topic
topic on face recognition from the existing topic on image                       User study
                                                                                 User preferences
processing, and she selected Face recognition as a seed phrase                   Differential privacy
                                                                                                            Autonomous driving
                                                                                                            Autonomous vehicles
                                                                                 Shed light
for split topic. Interestingly, although only one seed phrase                    Social sciences
                                                                                                            Traffic sign
                                                                                                            Improve generalization
was selected for the new topic in the above two examples,                        Differentially private
                                                                                 Traffic light
                                                                                                            Specifically designed
                                                                                                            Aerial vehicles
other unselected phrases related to the seed can correctly                                                  Autonomous cars
                                                                                                            Broad class
move to the new topic as well. In the third example, a partic-                                              Fully automatic
                                                                                                            Designed specifically
ipant found that an important phrase Computer vision was
assigned to a unexpected and inappropriate topic which is
not really meaningful, and she wanted to remove Computer             Figure 4: Real example of splitting two unrelated topics.
vision from this inappropriate topic and check if it is possible
to finally move it to a meaningful topic. After two rounds
of remove phrase, the phrase Computer vision moved to an
appropriate topic as shown in Figure 6.
                                                                   the latency of our system has significantly improved com-
                                                                   pared with previous systems based on tree prior, it can still be
5   CONCLUSION AND FUTURE WORK                                     a major issue for large scale data, so we need to study more
This paper proposes the first phrase-based interactive topic       efficient methods of inference using sparsity [22], which can
model which can provide both high interpretability and high        be smoothly applied to systems with phrase constraints. Cur-
interactivity with human in the loop, and demonstrates and         rent methods for automatically measuring topic coherence
examines the proposed approach with real data. Although            and quality are also mainly for models based on unigrams
Interactive Topic Model with Enhanced Interpretability                                                   IUI Workshops’19, March 20, 2019, Los Angeles, USA

                     Select a seed phrase                 Face recognition                            preprint arXiv:1607.04606 (2016).
                     “face recognition” for               Style transfer
                     split topic operation                Face images                             [6] Jordan Boyd-Graber. 2017. Humans and Computers Working Together
                                                          Facial expressions                          to Measure Machine Learning Interpretability. The Bridge 47 (2017),
                    Image classification                  Facial landmark
                    Input image                           Facial attributes                           6–10.
                    Face recognition                      Pattern recognition
                                                          Face detection
                                                                                                  [7] Marina Danilevsky, Chi Wang, Nihit Desai, Xiang Ren, Jingyi Guo,
                    Single image
                    Style transfer                        Face verification                           and Jiawei Han. 2014. Automatic Construction and Ranking of Topi-
                    Image captioning                      Referring expression
                    Face images
                                                                                                      cal Keyphrases on Collections of Short Documents. In Proceedings of
                    Image processing                      Image classification                        the 2014 SIAM International Conference on Data Mining, Philadelphia,
                    Image retrieval                       Input image
                    Natural images                        Single image                                Pennsylvania, USA, April 24-26, 2014. 398–406.
                    Facial expressions                    Medical imaging                         [8] Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare R. Voss, and Jiawei
                    Image quality                         Image captioning
                    Compressed sensing                    Image processing                            Han. 2014. Scalable Topical Phrase Mining from Text Corpora. Proc.
                                                          Image generation
                                                          Generated images                            VLDB Endow. 8, 3 (Nov. 2014), 305–316.
                                                          Natural images                          [9] Gregor Heinrich. 2004. Parameter estimation for text analysis. Technical
                                                          MR images
                                                                                                      Report.
                                                                                                 [10] Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith.
Figure 5: Real example of separating a fine-grained new                                               2014. Interactive Topic Modeling. Machine Learning 95 (2014), 423–
topic.                                                                                                469.
                                                                                                 [11] Jey Han Lau, David Newman, and Timothy Baldwin. 2014. Machine
                                                                                                      Reading Tea Leaves: Automatically Evaluating Topic Coherence and
                                     phrase Computer vision             phrase Computer vision        Topic Model Quality. In EACL. The Association for Computer Linguis-
    phrase Computer vision in
                                     move to a new but still            move to a new and             tics, 530–539.
    a inappropriate topic A
                                     inappropriate topic B              appropriate topic C
                                                                                                            “Medical
                                                                                                 [12] Tak Yeon    Lee,image”
                                                                                                                        Alisonrelated
                                                                                                                                 Smith, Kevin Seppi, Niklas Elmqvist, Jordan Boyd-
    Computer vision                   Computer vision                   Computer vision                     topic
    Address the problem               Computational cost                Medical imaging               Graber,   and Leah“Computational
                                                                                                                            Findlater. 2017.       The Human Touch: How Non-expert
                                                                                                                                              complexity”
    Large number                      Computationally efficient         MR images                                          related topic,  butFix
                                                                                                                                                stillTopic
                                                                                                                                                     undesired
    Benchmark datasets                Computational complexity          Breast cancer                 Users Perceive, Interpret,        and                Models. International Journal
                                                                                                                           for “computer vision”
    Classification tasks              Run time                          Image segmentation            of Human-Computer          Studies (2017).
    Challenging problem               Orders of magnitude               CT scans
    Classification problem            Computation time                  CT images                [13] Jialu Liu, Jingbo Shang, Chi Wang, Xiang Ren, and Jiawei Han. 2015.
    Small number                      Batch normalization               Magnetic resonance
    Large margin                      Batch size                        imaging
                                                                                                      Mining Quality Phrases from Massive Text Corpora. In Proceedings
    Open problem                      Inference time                    Image registration            of the 2015 ACM SIGMOD International Conference on Management of
                                                                                                      Data1st(SIGMOD
                                                                                                              round      ’15). 1729–1744.
    Benchmark datasets               Computational cost                                          [14] Thomas
                                                                                                          movingP. Minka. 2000. Move Estimating        a Dirichlet distribution. Technical
                                                                                                                                       from the previous
    Large number                     Computationally efficient
    Address the problem              Computational complexity                                         Report. MIT.              topic to a new topic
    Challenging task                 Run time                                                    [15] J. Shang, J. Liu, M. Jiang, X. Ren, C. R. Voss, and J. Han. 2018. Automated
    Large scale                      Orders of magnitude
    Classification tasks             Computation time                                                     2nd round
                                                                                                      Phrase    Mining from Massive Text Corpora. IEEE Transactions on
    Address this issue               Batch normalization                                                  moving
    Challenging problem              Batch size                                                       Knowledge and Data Engineering 30, 10 (2018), 1825–1837.
    Classification problem           Inference time                                              [16] Alison Smith, Varun         Kumar,
                                                                                                                             Undesired        Jordan
                                                                                                                                          topic  for     Boyd-Graber, Kevin Seppi, and
    Small number                     Message passing
                                                                                                      Leah Findlater. 2018.  “computer
                                                                                                                                   Closingvision”
                                                                                                                                                the Loop: User-Centered Design and
    Topic A after remove             Topic B after remove
    phrase Computer vision           phrase Computer vision                                           Evaluation of a Human-in-the-Loop Topic Modeling System. In 23rd
                                                                                                      International Conference on Intelligent User Interfaces (IUI ’18). 293–304.
                                                                                                 [17] Hanna M. Wallach, David Mimno, and Andrew McCallum. 2009. Re-
Figure 6: Real example of removing phrase from inappro-
                                                                                                      thinking LDA: Why Priors Matter. In Proceedings of the 22Nd Interna-
priate topics after two rounds and moving to an appropriate                                           tional Conference on Neural Information Processing Systems (NIPS’09).
topic finally.                                                                                        1973–1981.
                                                                                                 [18] Jun Wang, Junfu Xiang, and Kanji Uchino. 2015. Topic-Specific Recom-
                                                                                                      mendation for Open Education Resources. In Advances in Web-Based
                                                                                                      Learning – ICWL 2015, Frederick W.B. Li, Ralf Klamma, Mart Laanpere,
[2, 11], so we also need to study how to extend corresponding                                         Jun Zhang, Baltasar Fernández Manjón, and Rynson W.H. Lau (Eds.).
methods for phrase-based models.                                                                      Springer International Publishing, Cham, 71–81.
                                                                                                 [19] Xuerui Wang, Andrew McCallum, and Xing Wei. 2007. Topical N-
REFERENCES                                                                                            Grams: Phrase and Topic Discovery, with an Application to Informa-
[1] David Andrzejewski, Xiaojin Zhu, and Mark Craven. 2009. Incorpo-                                  tion Retrieval. In Proceedings of the 2007 Seventh IEEE International
    rating Domain Knowledge into Topic Modeling via Dirichlet Forest                                  Conference on Data Mining (ICDM ’07). 697–702.
    Priors. In Proceedings of the 26th Annual International Conference on                        [20] Pengtao Xie, Diyi Yang, and Eric P Xing. 2015. Incorporating Word
    Machine Learning (ICML ’09). 25–32.                                                               Correlation Knowledge into Topic Modeling. In The 2015 Conference
[2] Shraey Bhatia, Jey Han Lau, and Timothy Baldwin. 2018. Topic Intru-                               of the North American Chapter of the Association for Computational
    sion for Automatic Topic Model Evaluation. In EMNLP. Association                                  Linguistics.
    for Computational Linguistics, 844–849.                                                      [21] Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. 2017. Adapting
[3] D. Blei and J. Lafferty. 2009. Visualizing Topics with Multi-Word                                 Topic Models using Lexical Associations with Tree Priors. In Empirical
    Expressions. arXiv 0907.1013v1 (2009).                                                            Methods in Natural Language Processing.
[4] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent                             [22] Yi Yang, Doug Downey, and Jordan Boyd-Graber. 2015. Efficient Meth-
    Dirichlet Allocation. J. Mach. Learn. Res. 3 (March 2003), 993–1022.                              ods for Incorporating Knowledge into Topic Models. In Empirical
[5] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov.                                Methods in Natural Language Processing.
    2016. Enriching Word Vectors with Subword Information. arXiv