<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Finding Hierarchy of Topics from Twitter Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nghia Duong-Trung</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicolas Schilling</string-name>
          <email>schilling@ismll.uni-hildesheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lars Schmidt-Thieme</string-name>
          <email>schmidt-thieme@ismll.uni-hildesheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Systems and Machine Learning Lab (ISMLL) Universitatsplatz 1</institution>
          ,
          <addr-line>31141 Hildesheim</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <abstract>
        <p>Topic modeling of text collections is rapidly gaining importance for a wide variety of applications including information retrieval and automatic multimedia indexing. Our motivation is to exploit a hierarchical topic selection via nonnegative matrix factorization to capture the nature content of text posted on Twitter. This paper explores the use of an e ective framework to automatically discover hidden topics and their sub-topics. As input, the framework uses textual data. The output is then the discovered structure of topics. We introduce a conceptual topic modeling based on the idea of stability analysis to detect a hierarchy of topics given a text source. In this process, we apply stability measurement in conjunction with nonnegative matrix factorization and WordNet to excavate hidden topics by the scores of conceptual similarity. To demonstrate the e ectiveness and generalization, we apply the approach to a large-scale Twitter dataset to investigate the content topics. We also address the problems of several state-of-the-art topic modeling approaches that are unable to handle a large dataset.</p>
      </abstract>
      <kwd-group>
        <kwd>Unsupvervised Learning</kwd>
        <kwd>Semantics in Text Mining</kwd>
        <kwd>Conceptual Stability</kwd>
        <kwd>Hierarchy of Topics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Nonnegative matrix factorization (NMF) with nonnegativity constraints has
been considered as an e cient representation and an emerged technique for
text mining and document clustering [
        <xref ref-type="bibr" rid="ref11 ref17 ref2 ref22 ref23 ref9">22,2,17,23,9,11</xref>
        ]. For any desired low-rank
K, the NMF algorithm groups the data into clusters. The key issue is whether a
given low-rank K helps to decompose the data into appropriate separated
clusters. Therefore, the problem we study in this paper is how to e ectively and
e ciently discover the most appropriate structure of topics giving a text corpus
by exploiting the semantic meaning and the conceptual stability. In general, the
stability of a clustering model refers to its ability to consistently replicate
similar solutions on data randomly generated from the same source. In practice, this
involves a repeated re-sampling of data, applying a topic selection model, and
evaluating the results by a stability metric which measures the level of
discrimination between the resulting clusterings.
      </p>
      <p>
        We start with the previous work on the idea of random sub-sampling and
stability analysis via consensus clustering to discover the number of clusters that
best describes the data [
        <xref ref-type="bibr" rid="ref13 ref16 ref4">16,4,13</xref>
        ]. The basic assumption of stability in the
context of consensus clustering, in general, is very intuitive: for particular observed
data, if we perturb it into di erent random variabilities, and if they produce
the same cluster composition, or consensus, without radical di erence, we would
con dently consider that these clusters represent real structure. Consensus
clustering purely captures this procedure. Further work investigated by [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] improved
the consensus clustering technique by adding a quantitative evaluation for
robustness of the decomposition. They adopted a measure based on the cophenetic
correlation coe cient which indicates the dispersion of the consensus matrix. The
coe cient is calculated as the Pearson correlation of two distance matrices: the
consensus matrix captured the distance between data samples and the average
connectivity matrix over many clustering runs. Subsequently, [
        <xref ref-type="bibr" rid="ref12 ref13">12,13</xref>
        ] formulate
the idea of consensus matrix in the latent space learned by NMF.
      </p>
      <p>
        However, the computation of consensus matrix, Rn n matrix where n is the
number of tweets/documents, seems very costly, e.g. large amount of RAM is
required. For instance, if we apply the previous method on our experimented
Twitter dataset that we describe later in the paper, then 1400GB of RAM is
required to store the consensus matrix during model's computation. Hence, the
method provided by [
        <xref ref-type="bibr" rid="ref12 ref13">12,13</xref>
        ] is insu cient or even impossible for large datasets.
To overcome the drawbacks of the construction of consensus matrix, we propose
a topic selection approach, called the conceptual stability analysis, to smoothly
integrate with NMF that can be applied on large datasets e ectively.
      </p>
      <p>
        Moreover, we also evaluate several state-of-the-art topic modeling approaches
via Latent Dirichlet Allocation (LDA) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The rst baseline is the topic selection
method implemented by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The second baseline is proposed by [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We
implement the baselines by using [
        <xref ref-type="bibr" rid="ref10 ref18">10,18</xref>
        ]. However, these methods threw exception
due to large dataset during computation. An upper bound of RAM required for
each approach is 65GB until an exception occurs.
      </p>
      <p>With these limitation in mind, we introduce an unsupervised topic selection
method that enhances the accuracy and e ectiveness of NMF-based models in
the context of document clustering and topic modeling. We show that our
proposed method can work e ectively on large dataset within acceptable computing
resources such as RAM required and time of computation.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Theoretical Aspects and Proposed Framework</title>
      <sec id="sec-2-1">
        <title>Nonnegative Matrix Factorization</title>
        <p>Consider a dataset X 2 Rn m containing a set of n documents where each
document is described by m many features. The document features are mapped
from a dictionary that comprises all words/terms/tokens in the dataset. Each
positive entry Xij is either a raw term frequency or a term frequency - inverse
document frequency (TFIDF) score. By r and , we denote the sampling rate
and the number of subsets generated from X respectively. Then each subset
X 2 Rn0 m is a sample without replacement of X.</p>
        <p>Giving a desired number of topics k, the NMF algorithm iteratively computes
an approximation:</p>
        <p>X</p>
        <p>W H,
where W 2 Rn k and H 2 Rk m are nonnegative matrices. The conventional
technique to approximate W and H is by minimizing the di erence between X
and W H such that:</p>
        <p>min
W 0;H 0</p>
        <p>
          n m
f (W; H) = 1 X X
2
where (:) and (:) are regularization terms that are set as follows:
(W ) =
kW k2F
and
(H) =
m
X kH(:; i)k12,
i=1
(1)
(2)
(3)
where H(:; i) indicates the i-th column of H. The L1 norm term of (H) promotes
sparsity on the rows of H while the Frobenius norm term of (W ) prevents W
from growing too large. Scalar parameters and are used to control the
strength of regularization. The matrices W and H are found by minimizing
Equation (2) via estimating W and H in an alternating fashion using projected
gradients or coordinate descent [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
Now we start discussing our approach of computing stability based on the
usage of the WordNet hypernym hierarchy [
          <xref ref-type="bibr" rid="ref15 ref8">8,15</xref>
          ]. Given tokens cp and cq, then
wup(cp; cq) is the Wu-Palmer similarity [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], which is a scoring method based
on how similar the token senses are and where they occur relative to each other
in the WordNet hierarchy. Then, the Wu-Palmer similarity is calculated by:
        </p>
        <p>Similarly, the conceptual stability score between two topics Ru and Rv is
calculated in the same fashion.</p>
        <p>Pessimism and</p>
        <p>Negativity
7.1</p>
        <p>Hate and Anger
7.2 Daily Problems and</p>
        <p>Complains
7.4 Life and Changes
9 TraTnrsapvoerlitngand
9.1
9.2</p>
        <p>Landmarks
Journeys
(4)
(5)
(6)
wup(cp; cq) =</p>
        <p>2d
d1 + d2 + 2d
where d1 and d2 are the distances that separates the concept cp and cq from
their closest common ancestor and d is the distance which separates the closest
common ancestor of cp and cq from the root node.</p>
        <p>Each row of the low-rank matrix H represents one of the k topics and consists
of scores for each term. However, we only consider the top t m terms as they
contribute most to the semantic meaning of a topic. In practice, the contribution
of each token to topic i is represented by the scores in the i-th row in matrix H
generated by NMF. By sorting each row of H, we can assess the top t terms for
each topic. The set of top t tokens for all topics of a given H will be denoted
by S = fR1; : : : ; Rkg such that Ri 2 Rt is the topic i-th represented by top t
tokens. Within a topic, we calculate the conceptual stability score as follows:
sim(Rv) =</p>
        <p>2
t(t
1)
t 1 t
X X wup(Rvi; Rvj )
i=0 j=i+1
sim(Ru; Rv) =
t12 Xt Xt wup(Rui; Rvj )</p>
        <p>i=1 j=1</p>
        <p>Finally, we consider the problem of measuring the conceptual stability
between two di erent K-way topic clusterings Sw and Sl. Each ranked list contains
top t tokens that contribute most semantic meaning to the i-th topic. Then, the
conceptual stability between Sw and Sl is calculated by:
con(Sw; Sl) =
1 K</p>
        <p>X sim Rwk; (Rlk) ,
K</p>
        <p>
          k=1
where (Rwi) denotes the ranked list Rlj matched to the ranked list Rwi by
the permutation . The optimal permutation is found by solving the minimal
weight bipartite matching problem using the Hungarian method [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>Moreover, the problem of measuring the conceptual stability within the
Kway topic clustering Sw itself is also considered. The conceptual stability is then
calculated as follows:
con(Sw) =
1 K</p>
        <p>X sim(Rwk)
K</p>
        <p>k=1</p>
        <p>We now consider the conceptual stability at a particular number of topics k.
At rst we apply the NMF on the complete dataset X to get the factor matrices
H that we consider as the reference ranked lists. Let us de ne SX as the reference
K-way topic clustering, containing K ranked lists SX = fRX 1; : : : ; RX kg.</p>
        <p>Subsequently, we randomly resample times the documents of the original
X with the sampling rate r to obtain a random subset of X which we denote
by X . We then apply NMF on each X to get the factor matrix H . This
results in many sets fS1; : : : ; S g where each set contains k ranked lists Sj =
fRj 1; : : : ; Rj kg. Finally, we calculate the overall semantically conceptual stability
at k as following:
stability(k) =
1</p>
        <p>Pi=1 con(SX ; Si)</p>
        <p>Pi=1 con(Si)
max Pi=1 con(SX ; Si); Pi=1 con(Si)
(7)
(8)
(9)
(10)</p>
        <p>Then, the process is repeated to discover sub-topics in each sub-dataset.
Generally, we can expand the procedure deeper in the hierarchy. First, we calculate
the most appropriate number of topics L in the whole dataset X. Then, a subset
of X is drawn based on each value in range k 2 l; : : : ; L. The stability(k) score
is, in turn, calculated for each subset to nd the best number of sub-topics.</p>
        <p>The maximum stability score is achieved if and only if the top t tokens appear
in only one topic k. Otherwise, the minimum stability score is obtained if top t
tokens overpoweringly appear in every topic k.</p>
        <p>This process is repeated for a range of topics k. The most appropriate value
of k is identi ed by the highest value of stability(k) score. However, the scores
also reveal the possible range of k for further investigation. With the k topic
classi cation nalized at the rst level, the dataset is split into sub-datasets
where documents are assigned to topic with the highest score, e.g. through the
W matrix.</p>
        <p>^
kXi = argmax(Wik)
k
30: end for
Algorithm 1 The conceptual stability analysis approach with 2-level of
hierarchy
Input: Dataset X 2 Rn m, range of number of topics [K0; : : : ; K00], number of top
tokens t, sampling rate r, number of subsets
W H
W hHh</p>
        <p>W hHh</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Empirical Results</title>
      <sec id="sec-3-1">
        <title>Datasets, Experiment Setup, and Baselines</title>
        <p>
          The North America dataset is a large dataset of tweets that was originally used
for the geolocation prediction problem [
          <xref ref-type="bibr" rid="ref19 ref20 ref7">19,20,7</xref>
          ]. A document in this dataset is
(a) Experiment result on the tweets
dataset at the rst level of hierarchy
(b) Student Life and Relationship
(c) Information and Networking
(d) Business and Current A airs
(e) Routine Activities
(f) Leisure and Entertainment
Fig. 1. Experiment results on the Tweets dataset. Figure (1(a)) shows discovered topics
at the rst level. Similarly, the other gures present discovered topics at the second
level. The appropriate number of topics k are identi ed by peaks in the plots. The
vertical lines represent the highest peaks k.
the concatenation of all tweets by a single user. There were total 38 million
tweets tweeted by 430k users. The tweets were inside a bounding box covering
the continuous United States, a part of Canada and a part of Mexico. The nal
(a) Sport and Games
        </p>
        <p>(b) Pessimism and Negativity
(c) Wishes and Gratitude
(d) Transport and Travel
dataset after preprocessing is a very sparse text le that requires 2:2GB to store
and contains 430; 000 rows, e.g. the number of documents, and 59; 368 columns,
e.g. the vocabulary size.</p>
        <p>In our experiment, we set required model parameters as follows. The range
of exploring topics at the rst level is f5; : : : ; 25g. We expect the range of
subtopics is smaller in the second level so that the range of exploring sub-topics is
f2; : : : ; 12g. The number of top tokens that characterize a speci c topic is set to
t = 20. The sampling rate is set to r = 0:8 and the number of subsets is set to
= 25 to cancel out random e ects. Our experiments were conducted on a Xeon
E5-2670v2 with 2.5GHz clock speed and 128GB of RAM. However, an upper
bound of RAM required for our model is 5GB and it takes 4 days to complete.</p>
        <p>
          As we already mentioned in the introduction section, during our experiment,
we also compare our method with several state-of-the-art NMF-based and
LDAbased topic modeling approaches [
          <xref ref-type="bibr" rid="ref1 ref13 ref6">13,1,6</xref>
          ]. However, all these models either
cannot handle a large dataset or throw resource exception during computation.
3.2
The framework identi es distinctive topics and their sub-topics of documents
based on the output of stability scores. In theory, the deepest hierarchy of topics
is where documents are recursively classi ed until one topic only contains one
document. We do not specify the exact number of topics beforehand but rather
the range of desired topics and the model will gure out the most appropriate
values itself. In other words, the model takes (1) a very large textual dataset,
(2) a desired range of expectant number of topics, and (3) a desired level of
hierarchy. Then, the hierarchy of topics is discovered by considering conceptual
stability scores.
        </p>
        <p>Figure (1,2) present the potential number of topics and sub-topics at the
rst and second levels respectively. Table (1) summarizes topics and their
subtopics explored. As we can see in Table (1), topics at the rst level can be
divided into two groups based on the % share. People are concerned the most
about Pessimism and Negativity, Leisure and Entertainment, Student Life and
Relationship and Business and Current A airs. We now describe all the topics
and their sub-topics discovery in more detail.</p>
        <p>At the rst level, the highest peak is found at k = 9 which means that
the most distinctive number of topics given North America tweet dataset is
9. However, we also see potential high peaks at k = 11 and k = 7 if we need
manually to expand or condense the clustering results respectively. Consequently,
the whole dataset at the rst level is then divided into 9 sub-datasets that the
model continues discovering sub-topics within them.</p>
        <p>Next we consider sub-topics. Figure (1(b)) presents that the highest peak is
at k = 5 where we clearly see a shape. Similarly, we see the same shape in
the 4th, 6th, 7th and 8th topics which are presented in Figure (1(e), 2(a), 2(b) and
2(c)) respectively. The peaks made by the shape is the number of sub-topics
discovered by the model. Interestingly, the 2nd topic, Figure (1(c)), contains only
4:11% of the documents but can be divided into k = 8 distinctive sub-topics.
The 3rd and 9th topics, Figure (1(d), 2(d)) respectively, show an obvious peak
that the most suitable number of sub-topics is k = 2, the left most bound of the
experimented range. The 5th topic, Figure (1(f)), presents two candidates with
high magnitude peaks at k = 2 and k = 7. Although the highest peak is selected,
e.g. k = 2, as the output for sub-topics consideration, user can manually choose
the other peak as the desired output.
3.3</p>
      </sec>
      <sec id="sec-3-2">
        <title>Topics Labeling</title>
        <p>Having exploited the hierarchical topics structure, we next present their
associated labels. Table (2) summarize our labeling schemes. All topics and sub-topics
were subjectively labeled to ease the understanding and interpretation in
successive spatial distribution analysis. The labels were validated and assigned based
on the meaning of top tokens that characterize a speci c topic or sub-topic.</p>
        <p>More generally, questions of accuracy can be raised about the
representativeness of labels as a source for topics demonstration. In each discovered topic
and sub-topic, after collecting top tokens based on their meaning contribution,
a wide number of heuristic labeling schemes is considered to render each topic
representative and distinctive. After the labels are generated, a random selected
documents are reviewed and the labels are re-validated if needed. The loop is
required to ensure the assigned labels are acceptably appropriate. It is
important to consider that the labeling results from this paper re ect Twitter users'
opinions at the time the data was collected, not the population at large. The
revealed Twitter topics also were visualized using a comparison word cloud of the
top tokens in all topics and sub-topics, e.g. Figure (3). We report the principal
component analysis to inspect the subjective distinctiveness of topics in Figure
(4).
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this paper, we propose a topic selection approach to smoothly integrate with
NMF that can be applied on large datasets e ectively. The model automatically
discovers the most distinctive topics and sub-topics in many levels of desired
hierarchy by considering conceptual stability scores. The conceptual analysis
helps guide the selection of the appropriate number of topics and their
subtopics. The main strength of our approach is that it is entirely unsupervised
and does not require any training step. We also demonstrate the practicability
of our framework to get a better understanding of textual source. Starting from
addressing the drawbacks of consensus matrix models that exist more than a
decade, we have provided an e ective and powerful framework for large-scale
text mining and document clustering via NMF. We also present several
state-ofthe-art LDA-based topic modeling approaches that are unable to handle large
dataset.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arun</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suresh</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madhavan</surname>
            ,
            <given-names>C.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murthy</surname>
            ,
            <given-names>M.N.</given-names>
          </string-name>
          :
          <article-title>On nding the natural number of topics with latent dirichlet allocation: Some observations</article-title>
          .
          <source>In: Paci cAsia Conference on Knowledge Discovery and Data Mining</source>
          . pp.
          <volume>391</volume>
          {
          <fpage>402</fpage>
          . Springer (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Berry</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Browne</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Email surveillance using non-negative matrix factorization</article-title>
          .
          <source>Computational &amp; Mathematical Organization Theory</source>
          <volume>11</volume>
          (
          <issue>3</issue>
          ),
          <volume>249</volume>
          {
          <fpage>264</fpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          :
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>Journal of machine Learning research 3(Jan)</source>
          ,
          <volume>993</volume>
          {
          <fpage>1022</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Brunet</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tamayo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golub</surname>
            ,
            <given-names>T.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mesirov</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          :
          <article-title>Metagenes and molecular pattern discovery using matrix factorization</article-title>
          .
          <source>Proceedings of the national academy of sciences 101(12)</source>
          ,
          <volume>4164</volume>
          {
          <fpage>4169</fpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cichocki</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anh-Huy</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Fast local algorithms for large scale nonnegative matrix and tensor factorizations</article-title>
          .
          <source>IEICE transactions on fundamentals of electronics, communications and computer sciences 92(3)</source>
          ,
          <volume>708</volume>
          {
          <fpage>721</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Deveaud</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , SanJuan, E.,
          <string-name>
            <surname>Bellot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Accurate and e ective latent concept modeling for ad hoc information retrieval</article-title>
          .
          <source>Document numerique 17(1)</source>
          ,
          <volume>61</volume>
          {
          <fpage>84</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Duong-Trung</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schilling</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt-Thieme</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Near real-time geolocation prediction in twitter streams via matrix factorization based regression</article-title>
          .
          <source>In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management</source>
          . pp.
          <year>1973</year>
          {
          <year>1976</year>
          . ACM (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Fellbaum</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : WordNet. Wiley Online Library (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Gillis</surname>
          </string-name>
          , N.:
          <article-title>The why and how of nonnegative matrix factorization</article-title>
          . Regularization, Optimization, Kernels, and
          <source>Support Vector Machines</source>
          <volume>12</volume>
          (
          <issue>257</issue>
          ) (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hornik</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , Grun, B.:
          <article-title>topicmodels: An r package for tting topic models</article-title>
          .
          <source>Journal of Statistical Software</source>
          <volume>40</volume>
          (
          <issue>13</issue>
          ),
          <volume>1</volume>
          {
          <fpage>30</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reddy</surname>
            ,
            <given-names>C.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
          </string-name>
          , H.:
          <article-title>Simultaneous discovery of common and discriminative topics via joint nonnegative matrix factorization</article-title>
          .
          <source>In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          . pp.
          <volume>567</volume>
          {
          <fpage>576</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
          </string-name>
          , H.:
          <article-title>Sparse nonnegative matrix factorization for clustering (</article-title>
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kuang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
          </string-name>
          , H.:
          <article-title>Nonnegative matrix factorization for interactive topic modeling and document clustering</article-title>
          .
          <source>In: Partitional Clustering Algorithms</source>
          , pp.
          <volume>215</volume>
          {
          <fpage>243</fpage>
          . Springer (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kuhn</surname>
            ,
            <given-names>H.W.:</given-names>
          </string-name>
          <article-title>The hungarian method for the assignment problem</article-title>
          .
          <source>Naval research logistics quarterly 2(1-2)</source>
          ,
          <volume>83</volume>
          {
          <fpage>97</fpage>
          (
          <year>1955</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>G.A.</given-names>
          </string-name>
          :
          <article-title>Wordnet: a lexical database for english</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>38</volume>
          (
          <issue>11</issue>
          ),
          <volume>39</volume>
          {
          <fpage>41</fpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Monti</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tamayo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mesirov</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Consensus clustering: a resamplingbased method for class discovery and visualization of gene expression microarray data</article-title>
          .
          <source>Machine learning 52(1-2)</source>
          ,
          <volume>91</volume>
          {
          <fpage>118</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Pauca</surname>
            ,
            <given-names>V.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shahnaz</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berry</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plemmons</surname>
          </string-name>
          , R.J.:
          <article-title>Text mining using nonnegative matrix factorizations</article-title>
          .
          <source>In: SDM</source>
          . vol.
          <volume>4</volume>
          , pp.
          <volume>452</volume>
          {
          <fpage>456</fpage>
          .
          <string-name>
            <surname>SIAM</surname>
          </string-name>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Porteous</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Newman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ihler</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Asuncion</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smyth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welling</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Fast collapsed gibbs sampling for latent dirichlet allocation</article-title>
          .
          <source>In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <volume>569</volume>
          {
          <fpage>577</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Roller</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Speriosu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rallapalli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wing</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldridge</surname>
          </string-name>
          , J.:
          <article-title>Supervised textbased geolocation using language models on an adaptive grid</article-title>
          .
          <source>In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning</source>
          . pp.
          <volume>1500</volume>
          {
          <fpage>1510</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Wing</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldridge</surname>
          </string-name>
          , J.:
          <article-title>Hierarchical discriminative classi cation for text-based geolocation</article-title>
          .
          <source>In: EMNLP</source>
          . pp.
          <volume>336</volume>
          {
          <issue>348</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Verbs semantics and lexical selection</article-title>
          .
          <source>In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics</source>
          . pp.
          <volume>133</volume>
          {
          <fpage>138</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
          </string-name>
          , H.:
          <article-title>Topic modeling via nonnegative matrix factorization on probability simplex</article-title>
          .
          <source>In: NIPS workshop on topic models: computation</source>
          , application, and
          <string-name>
            <surname>evaluation</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gong</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Document clustering based on non-negative matrix factorization</article-title>
          .
          <source>In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval</source>
          . pp.
          <volume>267</volume>
          {
          <fpage>273</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>