<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ARTM vs. LDA: an SVD Extension Case Study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergey Nikolenko</string-name>
          <email>sergey@logic.pdmi.ras.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Deloitte Analytics Institute</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kazan (Volga Region) Federal University</institution>
          ,
          <addr-line>Kazan</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Laboratory for Internet Studies, NRU Higher School of Economics</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Steklov Institute of Mathematics at St. Petersburg</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this work, we compare two extensions of two different topic models for the same problem of recommending full-text items: previously developed SVD-LDA and its counterpart SVD-ARTM based on additive regularization. We show that ARTM naturally leads to the inference algorithm that has to be painstakingly developed for LDA. Topic models are an important part of the natural language processing landscape, providing unsupervised ways to quickly evaluate what a whole corpus of texts is about and classify them into well-defined topics. LDA extensions provide ways to augment basic topic models with additional information and retool them to serve other purposes. In a previous work, we have combined the SVD and LDA decompositions into a single unified model that optimizes the joint likelihood function and thus infers topics that are especially useful for improving recommendations. We have provided an inference algorithm based on Gibbs sampling, developing an approximate sampling scheme based on a first order approximation to Gibbs sampling [1]. A recently developed ARTM approach [2-5] extends the basic pLSA model with regularizers and provides a unified way to add new additive regularizers; inference algorithms result with simple differentiation of the regularizers. In this work, we apply ARTM to the problem of adding SVD decompositions to a topic model; we show that one can automatically arrive at an inference algorithm very similar to our previous SVD-LDA approach.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        LDA and SVD-LDA
The graphical model of LDA [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ] is shown on Figure 1a. We assume that a
corpus of D documents contains T topics expressed by W different words. Each
document d 2 D is modeled as a discrete distribution (d) on the set of topics:
p(zw = j) = (d), where z is a discrete variable that defines the topic of each
word w 2 d. Each topic, in turn, corresponds to a multinomial distribution
on words: p(w j zw = k) = (wk). The model also introduces prior Dirichlet
distributions with parameters for the topic vectors , Dir( ), and for
(a)
      </p>
      <p>(b)
the word distributions , Dir( ). A document is generated word by word:
for each word, we (1) sample the topic index k from distribution (d); (2) sample
the word w from distribution (wk). Inference in LDA is usually done via either
variational approximations or Gibbs sampling; we use the latter since it is easy to
generalize to further extensions. In the basic LDA model, Gibbs sampling reduces
to the so-called collapsed Gibbs sampling, where and variables are integrated
out, and zw are iteratively resampled according to the following distribution:
p(zw = t j z w; w; ; ) / Pt02nT(dwn);(tdw+);t0 + Pw02nW(ww)n;t(+ww0;)t+ ; where n(dw);t is the
number of words in document d chosen with topic t and n(ww);t is the number
of times word w has been generated from topic t apart from the current value
zw; both counters depend on the other variables z w. Samples are then used
to estimate model variables: d;t = Pt02nT(dwn);(tdw+);t0 + ; w;t = Pw02nW(ww)n;t(+ww0);t+ ;
where w;t denotes the probability to draw word w in topic t and d;t is the
probability to draw topic t for a word in document d.</p>
      <p>
        The basic LDA model has been used and extended in numerous
applications; the relevant class of extensions for us now takes into account additional
information that may be available together with the documents and may
reveal additional insights into the topical structure. For instance, the Topics over
Time model and dynamic topic models apply when documents have timestamps
of their creation (e.g., news articles or blog posts) [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8–10</xref>
        ], DiscLDA assumes that
each document is assigned with a categorical label and attempts to utilize LDA
for mining topic classes related to classification [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], the Author-Topic model
incorporates information about the authors of a document [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ], and so on.
      </p>
      <p>
        The SVD-LDA model, presented in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], can be regarded as an extension of
the Supervised LDA (sLDA) model [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The sLDA graphical model is shown
on Fig. 1b. In sLDA, each document is now augmented with a response variable
y drawn from a normal distribution centered around a linear combination of
the document’s topical distribution (z, average z variables in this document)
with some unknown parameters b, a that are also to be trained: y N (y j
b&gt;z + a; 2):
      </p>
      <p>
        The original work [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] presents an inference algorithm for sLDA based on
variational approximations, but in this work we operate with Gibbs sampling
which will be easier to extend to SVD-LDA later. Thus, we show an sLDA
Gibbs sampling scheme. It differs from the original LDA in that the model
likelihood gets another factor corresponding to the y variable: p(yd j z; b; 2) /
exp
(yd
b&gt;z
      </p>
      <p>a)2=2 ; and the total likelihood is now p(z j w; y; b; 2) /
Qd B(Bn(d+) ) Qt B(Bn(t+) ) Qd e (yd b&gt;zd a)2=2: On each iteration of the sampling
algorithm, we now first sample z for fixed b and then train b for fixed (sampled)
z. The sampling distributions for each z variable, according to the equation
above, are p(zw = t j z w; w; ; ) / q(zw; t; z w; w; ; )e 21 (yd b&gt;z a)2 =
n(dw);t+ n(ww);t+ e 21 (yd b&gt;z a)2 : The latter equation can be either used
P n(dw);t0 + P n(ww0;)t+
t0 w0
directly or further transformed by separating z w explicitly.</p>
      <p>SVD-LDA considers a recommender system based on likes and dislikes, so it
uses the logistic sigmoid (x) = 1= (1 + exp( x)) of a linear function to model
the probability of a “like”: p(successi;a) = b&gt;z + a . In this version of sLDA,
the graphical model remains the same, only conditional probabilities change.
The total likelihood is now p(z j w; y; b; ; ; 2) /
d
Y B(nd + ) Y B(nt + ) Y</p>
      <p>B( ) B( )
t</p>
      <p>Y
d x2Xd
b&gt;zd + a
yx
1
b&gt;zd + a
1 yx
;
where Xd is the set of experiments (ratings) for document d, and yx is the binary
result of one such experiment. The sampling procedure also remains the same,
except that now we train logistic regression with respect to b, a for fixed z
instead of linear regression, and the sampling probabilities for each z variable
are now p(zw = t j z w; w; ; ) /
b&gt;zd + a
iyx h
1
b&gt;zd + a
i1 yx
q(zw; t; z w; w; ; )</p>
      <p>Y h
x2Xd
=
where sd is the number of successful experiments among Xd, and pd = 1+e b1&gt;zd a</p>
      <p>
        The SVD-LDA extension has been introduced in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as follows: for
recommendations we use an SVD model with additional predictors corresponding to how
much a certain user or group of user likes the topics trained in the LDA model;
since our dataset is binary (like-dislike), we use a logistic version of the SVD
model: p(successi;a) = (r^i;a) = + bi + ba + qa&gt;pi + a&gt;li ; where pi may
be absent in case of cold start, and li may be shared among groups (clusters)
of users. The total likelihood of the dataset with ratings comprised of triples
D = f(i; a; r)g (user i rated item a as r 2 f 1; 1g) is a product of the
likelihood of each rating (assuming, as usual, that they are independent): p(D j
esd log pd+(jXdj sd) log(1 pd);
; bi; ba; pi; qa; li; a) = QD (r^i;a)[r=1] (1 (r^i;a))[r= 1] ; and the logarithm is
log p(D j ; bi; ba; pi; qa; li; a) = PD ([r = 1] log (r^i;a) + [r = 1] log (1 (r^i;a))) ;
where [r = 1] = 1 if r = 1 and [r = 1] = 0 otherwise, and a is the vector of
topics trained for document a in the LDA model, a = N1a Pw2a zw, where Na is
the length of document a. Sampling probabilities for each z variable now look like
p(zw = t j z w; w; ; ) / q(zw; t; z w; w; ; )p(D j ; bi; ba; pi; qa; li; aw!t) =
=
where r^iS;VaD = + bi + ba + qa&gt;pi; and aw!t is the vector of topics for document
a where topic t is substituted in place of zw. We see that in the formula above,
to compute the sampling distribution for a single zw variable one has to take a
sum over all ratings all users have provided for this document, and due to the
presence of the sigmoid function one cannot cancel out terms and reduce the
SVD in
sum to updating counts. It is possible to store precomputed values of r^i;a
memory, but it does not help because the zw variables change during sampling,
and when they do all values of (r^iS;VaD + li&gt; aw!t) also have to be recomputed
for each rating from the database.
      </p>
      <p>
        To make the model feasible, a simplified SVD-LDA training algorithm was
developed in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] that could run reasonably fast on large datasets. It used a first
order approximation to the log likelihood based on its Taylor series at zero:
      </p>
      <p>(r^iS;VaD + a&gt;li) li</p>
      <p>We denote sa = PD [r = 1] r^iS;VaD + a&gt;li li: We can now precompute
sa (a vector over topics) for each document right after SVD training (with
additional memory of the same size as the matrix) and use it in LDA sampling:
p(zw = t j z w; w; ; ) / q(zw; t; z w; w; ; )p(D j ; bi; ba; pi; qa; li; aw!t)
and the latter is proportional to simply Pt02nT(dwn);(tdw+);t0 + Pw02nW(ww)n;t(+ww0);t+ est
because sa aw!t = sa a swzw + stzt, and the first two terms do not depend on t
which is being sampled. Thus, the first order approximation yields a simple
modification of LDA sampling that incurs relatively small computational overhead
as compared to the sampling itself.</p>
      <p>
        We have outlined a general approximate sampling scheme; several different
variations are possible depending on which predictors are shared in the basic
SVD model, p(successi;a) = (r^i;a) : In general, a separate set of li features for
every user would lead to heavy overfitting, so we used two variations: either share
li = l among all users or share li = lc among certain clusters of users, preferably
inferred from some external information, e.g., demographic features. Both
variations can be used for cold start with respect to users. Table 1 summarizes the
results of experiments that show that SVD-LDA does indeed improve upon the
basic LDA model [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>SVD-ARTM</title>
      <p>
        In recent works [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2–4</xref>
        ], K. Vorontsov and coauthors demonstrated that if one
adds regularizers in the objective function on the training stage of the basic
probabilistic Latent Semantic Analysis (pLSA) model, which actually predates
LDA [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], one can impose a very wide variety of constraints on the resulting topic
model. This approach has been called Additive Regularization of Topic Models
(ARTM). In particular, the authors showed that one can formulate a regularizer
that imposes constraints on the smoothness of topic-document and word-topic
distributions that will correspond to the Bayesian approach expressed in LDA
(i.e., it will smooth out the distributions).
      </p>
      <p>Formally speaking, for a set of regularizers Ri( ; ), i = 1::r, and
regularization weights i, i = 1::r, we can extend the objective function to
maximize L( ; ) + R( ; ) = Pd2D Pw2W ndw log p(w j d) + Pir=1 iRi( ; ): By
Karush–Kuhn–Tucker conditions, any solution of the resulting problem satisfies
the following system of equations:
ptdw = normt+2T ( wt td) ;
wt = normw+2W
nwt +
nwt =</p>
      <sec id="sec-2-1">
        <title>X ndwptdw;</title>
        <p>ntd =</p>
        <p>
          X ndwptdw;
d2D
;
td = normt+2T
w2d
maxfxa;0g
where norm+ denotes non-negative normalization: norma+2Axa = Pb2A maxfxb;0g .
This system of equations yields a natural iterative algorithm (Newton’s method)
for finding the parameters wt and td, equivalent to EM inference in pLSA;
see [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] for a full derivation and a more detailed treatment. Thus, we have a model
which is very easy to extend and which is computationally cheaper to train than
the LDA model, especially LDA extensions that rely on Gibbs sampling.
        </p>
        <p>To extend ARTM with an SVD-based regularizer, we begin with a regularizer
in the same form as in Section 2: the total likelihood of the dataset with ratings
comprised of triples D = f(i; a; r)g (user i rated item a as r 2 f 1; 1g) is a
product of the likelihood of each rating, so its logarithm is</p>
        <p>R( ; ) = log p(D j ; bi; ba; pi; qa; li; a) =
= X ([r = 1] log (r^i;a) + [r =</p>
        <p>D
1] log (1
(r^i;a))) ;
where [r = 1] = 1 if r = 1 and [r = 1] = 0 otherwise, and a is the vector
of topics trained for document a in the LDA model, a = N1a Pw2a zw, where
Na is the length of document a, and</p>
        <p>SVD + a&gt;li =
r^i;a = r^i;a</p>
        <p>+ bi + ba + qa&gt;pi + a&gt;li:
To add this regularizer to the pLSA model, we have to compute its partial
derivatives with respect to the parameters:
note that the latter equality is exactly the same as (1) (hence we omit the
derivation), only now it is a direct part of the algorithm rather than a first
order approximation to the sampling. The final algorithm is, thus, to iterate the
following:
ptaw = normt+2T ( wt ta) ;</p>
        <p>nwt =
wt = normw+2W nwt;</p>
        <p>0</p>
      </sec>
      <sec id="sec-2-2">
        <title>X nawptaw;</title>
        <p>a2D
nta =</p>
      </sec>
      <sec id="sec-2-3">
        <title>X nawptaw;</title>
        <p>w2a</p>
        <p>1
[r = 1]</p>
        <p>(r^iS;VaD + a&gt;li) liA :
ta</p>
        <p>X
(i;a;r)2D
Similar to SVD-LDA, we can precompute sa = PD [r = 1] r^iS;VaD + a&gt;li li
(it is a vector over topics) for each document after SVD is trained and use it
throughout a pLSA iteration.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>
        In this work, we have developed an ARTM regularizer that adds an SVD-based
matrix decomposition model on top of ARTM. We have shown that the resulting
inference algorithms closely match the inference algorithms developed in the
SVD-LDA modification of LDA with a first-order approximation to the Gibbs
sampling. In further work, we plan to implement this regularizer and incorporate
it into the BigARTM library [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
      </p>
      <p>Acknowledgements. This work was supported by the Russian Science Foundation
grant no. 15-11-10019.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Nikolenko</surname>
            ,
            <given-names>S.I.</given-names>
          </string-name>
          :
          <article-title>SVD-LDA: Topic modeling for full-text recommender systems</article-title>
          .
          <source>In: Proc. 14th Mexican International Conference on Artificial Intelligence. LNAI</source>
          vol.
          <volume>9414</volume>
          , Springer (
          <year>2015</year>
          )
          <fpage>67</fpage>
          -
          <lpage>79</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Vorontsov</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Additive regularization for topic models of text collections</article-title>
          .
          <source>Doklady Mathematics</source>
          <volume>89</volume>
          (
          <year>2014</year>
          )
          <fpage>301</fpage>
          -
          <lpage>304</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Potapenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vorontsov</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Robust pLSA performs better than LDA</article-title>
          .
          <source>In: Proc. 35th European Conf. on IR Research</source>
          . LNCS vol.
          <volume>7814</volume>
          , Springer (
          <year>2013</year>
          )
          <fpage>784</fpage>
          -
          <lpage>787</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Vorontsov</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frei</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Apishev</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suvorova</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yanina</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Nonbayesian additive regularization for multimodal topic modeling of large collections</article-title>
          .
          <source>In: Proc. of the 2015 Workshop on Topic Models: Post-Processing and Applications. TM '15</source>
          , New York, NY, USA, ACM (
          <year>2015</year>
          )
          <fpage>29</fpage>
          -
          <lpage>37</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Sokolov</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bogolubsky</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Topic models regularization and initialization for regression problems</article-title>
          .
          <source>In: Proc. of the 2015 Workshop on Topic Models: PostProcessing and Applications</source>
          . TM '
          <volume>15</volume>
          , New York, NY, USA, ACM (
          <year>2015</year>
          )
          <fpage>21</fpage>
          -
          <lpage>27</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          :
          <article-title>Latent Dirichlet allocation</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>3</volume>
          (
          <year>2003</year>
          )
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Griffiths</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steyvers</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Finding scientific topics</article-title>
          .
          <source>Proceedings of the National Academy of Sciences 101 (Suppl. 1)</source>
          (
          <year>2004</year>
          )
          <fpage>5228</fpage>
          -
          <lpage>5335</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Topics over time: a non-Markov continuous-time model of topical trends</article-title>
          .
          <source>In: Proc. of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , New York, NY, USA, ACM (
          <year>2006</year>
          )
          <fpage>424</fpage>
          -
          <lpage>433</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lafferty</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          :
          <article-title>Dynamic topic models</article-title>
          .
          <source>In: Proc. of the 23rd International Conference on Machine Learning</source>
          , New York, NY, USA, ACM (
          <year>2006</year>
          )
          <fpage>113</fpage>
          -
          <lpage>120</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heckerman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Continuous time dynamic topic models</article-title>
          .
          <source>In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence</source>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lacoste-Julien</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sha</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.:</given-names>
          </string-name>
          <article-title>DiscLDA: Discriminative learning for dimensionality reduction and classification</article-title>
          .
          <source>Advances in Neural Information Processing Systems</source>
          <volume>20</volume>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Rosen-Zvi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Griffiths</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steyvers</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smyth</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>The author-topic model for authors and documents</article-title>
          .
          <source>In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence</source>
          , Arlington, Virginia, United States, AUAI Press (
          <year>2004</year>
          )
          <fpage>487</fpage>
          -
          <lpage>494</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Rosen-Zvi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chemudugunta</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Griffiths</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smyth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steyvers</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Learning author-topic models from text corpora</article-title>
          .
          <source>ACM Trans. Inf. Syst</source>
          .
          <volume>28</volume>
          (
          <year>2010</year>
          )
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McAuliffe</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          :
          <article-title>Supervised topic models</article-title>
          .
          <source>Advances in Neural Information Processing Systems</source>
          <volume>22</volume>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Hoffmann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Unsupervised learning by probabilistic latent semantic analysis</article-title>
          .
          <source>Machine Learning</source>
          <volume>42</volume>
          (
          <year>2001</year>
          )
          <fpage>177</fpage>
          -
          <lpage>196</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>