<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Rocco Tripodi Sapienza NLP Group Department of Computer Science, Sapienza University of Rome</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. This paper presents a new topic modelling framework inspired by game theoretic principles. It is formulated as a normal form game in which words are represented as players and topics as strategies that the players select. The strategies of each player are modelled with a probability distribution guided by a utility function that the players try to maximize. This function induces players to select strategies similar to those selected by similar players and to choice strategies not shared with those selected by dissimilar players. The proposed framework is compared with state-of-the-art models demonstrating good performances on standard benchmarks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Italiano. Questo articolo presenta un
approccio di modellazione dei topic ispirato
alla teoria dei giochi. La modellazione dei
topic e` vista come un gioco in forma
normale in cui le parole rappresentano i
giocatori e i topic le strategie che i giocatori
possono scegliere. Ogni giocatore sceglie
le strategie da impiegare tramite una
distribuzione di probabilita` che viene
influenzata da una funzione di utilita` che i
giocatori cercano di massimizzare. Questa
funzione incentiva i giocatori a scegliere
strategie simili a quelle impiegate da
giocatori simili e disincentiva la scelta di
strategie condivise con giocatori
dissimili. Il confronto con modelli allo stato
dell’arte dismostra buone prestazioni su
diversi dataset di valutazione.</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction</title>
      <p>
        Topic modeling is a technique that discovers the
underlying topics contained in a collection of
documents
        <xref ref-type="bibr" rid="ref12 ref3">(Blei, 2012; Griffiths and Steyvers, 2004)</xref>
        .
It can be used in different tasks of text
classification, document retrieval, and sentiment analysis,
providing together vector representations of words
and documents. State-of-the-art systems are based
on probabilistic
        <xref ref-type="bibr" rid="ref1 ref18 ref2 ref22 ref6 ref7">(Blei et al., 2003; Mcauliffe and
Blei, 2008; Chong et al., 2009)</xref>
        and neural
networks models
        <xref ref-type="bibr" rid="ref1 ref13 ref16 ref2 ref4 ref8">(Bengio et al., 2003; Hinton and
Salakhutdinov, 2009; Larochelle and Lauly, 2012;
Cao et al., 2015)</xref>
        . A different perspective based on
game theory is proposed in this article.
      </p>
      <p>
        The use of game-theoretic principles in machine
learning
        <xref ref-type="bibr" rid="ref11 ref23">(Goodfellow et al., 2014)</xref>
        , pattern
recognition
        <xref ref-type="bibr" rid="ref21">(Pavan and Pelillo, 2007)</xref>
        and natural
language processing
        <xref ref-type="bibr" rid="ref25 ref28">(Tripodi et al., 2016; Tripodi and
Navigli, 2019)</xref>
        problems is developing a
promising field of research with the development of
original models. The main difference between
computational models based on optimization techniques
and game-theoretic models is that the former tries
to maximize (minimize) a function (that in many
cases is non-convex) and the latter tries to find
the equilibrium state of a dynamical system. The
equilibrium concept is useful because it represents
a state in which all the constraints of a given
system are satisfied and no object of the system has
an incentive to deviate from it, because a
different configuration will immediately lead to a worse
situation in terms of payoff and fitness, at object
and system level. Furthermore, it is guaranteed
that the system converges to a mixed strategy Nash
equilibrium
        <xref ref-type="bibr" rid="ref20">(Nash, 1951)</xref>
        . So far, game-theoretic
models have been used in classification and
clustering tasks
        <xref ref-type="bibr" rid="ref21 ref24 ref26 ref27">(Pavan and Pelillo, 2007; Tripodi and
Pelillo, 2017)</xref>
        . In this work, it is proposed a
gametheoretic model for inferring a low dimensional
representation of words that can capture their
latent semantic representation.
      </p>
      <p>
        In this work, topic modeling is interpreted as a
symmetric non-cooperative game
        <xref ref-type="bibr" rid="ref30">(Weibull, 1997)</xref>
        in which, the words are the players and the topics
are the strategies that the players can select. Two
players are matched to play the games together
according to the co-occurrence patterns found in the
corpus under study. The players use a probability
distribution over their strategies to play the games
and obtain a payoff for each strategy. This reward
helps them to adjust their strategy selection in
future games, considering what strategy has been
effective in previous games. It allows concentrating
more mass on the strategies that get high reward.
The underlying idea to model the payoff function
is to create two influence dynamics, the first one
forces similar players (words that appear in
similar contexts) to select similar strategies; the
second one forces dissimilar players (words that do
not share any context) to select different strategies.
The games are played repeatedly until the system
converges, that is, the difference among the
strategy distributions of the players at time t and at
time t 1 is under a small threshold. The
convergence of the system corresponds to an equilibrium,
a situation in which there is an optimal association
of words and topics.
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        Hofmann (1999) proposed one of the earliest topic
models, probabilistic Latent Semantic Indexing
(pLSI). It represents each word in a document
as a sample from a mixture model, where
topics are represented as multinomial random
variables and documents as a mixture of topics.
Latent Dirichlet Allocation (LDA)
        <xref ref-type="bibr" rid="ref1 ref2">(Blei et al., 2003)</xref>
        ,
the most widely used topic model, is a
generalization of pLSI that introduces Dirichlet priors for
both the word multinomial distributions over
topics and topic multinomial distributions over
documents. This line of research has been developed
building on top of LDA different features to
infer correlations among topics
        <xref ref-type="bibr" rid="ref15">(Lafferty and Blei,
2006)</xref>
        or to model jointly words and labels in a
supervised way
        <xref ref-type="bibr" rid="ref18 ref22">(Mcauliffe and Blei, 2008)</xref>
        .
      </p>
      <p>
        Topic models based on neural network
principles have been introduced with the neural
network language model proposed in
        <xref ref-type="bibr" rid="ref1 ref2">(Bengio et al.,
2003)</xref>
        . This paradigm is very popular in NLP and
many topic models are based on it because with
these techniques it is possible to obtain a
lowdimensional representation of the data. In
particular, auto-encoders
        <xref ref-type="bibr" rid="ref18 ref22">(Ranzato and Szummer, 2008)</xref>
        ,
Boltzmann machines
        <xref ref-type="bibr" rid="ref13">(Hinton and Salakhutdinov,
2009)</xref>
        and autoregressive distributions
        <xref ref-type="bibr" rid="ref16">(Larochelle
and Lauly, 2012)</xref>
        have been used to model
documents with layer-wise neural network tools.
Neural Topic Model (NTM;
        <xref ref-type="bibr" rid="ref4 ref8">(Cao et al., 2015)</xref>
        ) tries to
overcome some limitations of classical topic
models, such as the initialization problem and the
generalization to n-grams. It exploits word
embedding to represent n-grams and uses
backpropagation to adjust the weights of the network between
the embedding and the word-topic and
documenttopic layers. A general framework for topic
modeling based also on neural networks is Sparse
Contextual Hidden and Observed Language
AutoencodeR (SCHOLAR;
        <xref ref-type="bibr" rid="ref10 ref5">(Card et al., 2018)</xref>
        ). It allows
using covariates to influence the topic distributions
and labels to include supervision. As Sparse
Additive GEnerative models (SAGE;
        <xref ref-type="bibr" rid="ref9">(Eisenstein et al.,
2011)</xref>
        )it can produce sparse topic representations
but differently from it and Structural Topic Model
(STM;
        <xref ref-type="bibr" rid="ref11 ref23">(Roberts et al., 2014)</xref>
        ) it can easily consider
a larger set of metadata. A graphical topic model
was proposed by Gerlach et al. (2018). In this
framework, the task of finding topical structures
is interpreted as the task of finding communities
in complex networks. It is particularly interesting
because it shows analogies with traditional topic
models and overcomes some of their limitations
such as the bound with a Bayesian prior and the
need to specify the number of topics in advance.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>Topic Modelling Games</title>
      <p>
        Normal-form games consist of a finite set of
players N = (1; ::; n), a finite set of pure strategies,
Si = f1; :::; mig for each player i 2 N and a
payoff (utility) function ui : S ! R, that
associates a payoff to each combination of strategies
S = S1 S2 ::: Sn. The payoff function does
not depend only on the strategy chosen by a single
player but by the combination of strategies played
at the same time by the players. Each player tries
to maximize the value of ui. Furthermore, in
noncooperative games the players choose their
strategies independently, considering what other
players can play and trying to find the best response
to the strategy of the co-players. Nash equilibria
        <xref ref-type="bibr" rid="ref20">(Nash, 1951)</xref>
        represent the key concept of game
theory and can be defined as those strategy
combinations in which each strategy is a best response
to the strategy of the co-player and no player has
the incentive to unilaterally deviate from them
because there is no way to do better. In addition
to play pure strategies, that correspond to
selecting just one strategy from those available in Si,
a player i can also use mixed strategies, which
are probability distributions over pure strategies.
A mixed strategy over Si is defined as a
vector xi = (x1; : : : ; xmi ), such that xj 0 and
P xj = 1. In a two-player game, a strategy
profile can be defined as a pair (xi; xj ). The expected
payoff for this strategy profile is computed as:
T
u(xi; xj ) = xi
      </p>
      <p>Aij xj
where Aij is the mi mj payoff matrix between
player i and j.</p>
      <p>
        Evolutionary game theory
        <xref ref-type="bibr" rid="ref30">(Weibull, 1997)</xref>
        has
introduced two important modifications: 1. the
games are played repeatedly, and 2. the players
update their mixed strategy over time until it is not
possible to improve the payoff. The players, with
these two modifications, can develop an inductive
learning process, that allows them to learn their
strategy distribution according to what other
players are selecting. The payoff corresponding to the
h-th pure strategy is computed as:
(1)
(2)
u(xih) = xi
h
ni
X(Aij xj )h
j=1
The average payoff of player i is calculated as:
u(xi) =
mi
X u(xih)
h=1
To find the Nash equilibrium of the game, it is
common to use the replicator dynamics equation
        <xref ref-type="bibr" rid="ref30">(Weibull, 1997)</xref>
        . It allows better than average
strategies to grow at each iteration. It can be
considered as an inductive learning process, in which
the players learn from past experiences how to
play their best strategy. It is important to notice
that each player optimizes its individual strategy
space, but this operation is done according to what
other players simultaneously are doing so the local
optimization is the result of a global process.
Data Preparation The players of the topic
modelling games are the words v = (1; : : : ; n) in the
vocabulary V of the corpus under analysis and the
strategies S = (1; : : : ; m) are the topics to extract
from the same corpus. The strategy space xi of
each player i is represented as a probability
distribution that can be interpreted as the mixture of
topics typically used in topic modeling. The
interactions among the players are modeled using
the n n adjacency matrix (W ) of an undirected
weighted graph. Each entry wij encodes the
similarity between two words. The strategy space of
the games can be represented as a n m matrix
X, where each row represents the probability
distribution of a player over its m strategies (topics
that have to be extracted from the corpus).
Payoff Function and System Dynamics The
payoff function of the game is constructed
exploiting the information stored in W . This
matrix gives us the structural information of the
corpus. It allows us to select the players with whom
each player is playing the games, indicated with
the presence of an edge between two nodes
(players), and to quantify the level of influence that each
player has on the other, indicated with the weight
on each edge. The absence of an edge in this graph
indicates that two words are distributional
dissimilar. Using these three sources of information we
model a payoff function that forces similar players
to choose similar strategies (topics) and dissimilar
players to choose different ones. The payoff of a
player is calculated as,
      </p>
      <p>
        ni
u(xih) = xih(X(Aij xj )h
j=1
negi
X( xg)h)
g=1
(3)
where the first summation is over all the ni
direct neighbors of player i that are the players with
whom i share some similarity and the second
summation is over the negi negative players of player
i, that are players with whom player i does not
share any similarity. With the first summation
player i will negotiate with its neighbors a
correlated strategy (topic), with the second he will
deviate from the strategies chosen by negative players,
this is done by subtracting the payoff that i would
have gained if these negative players would have
been his neighbors. The negative players are
sampled from V according to frequency, in the same
way, negative samples are selected in word
embeddings models
        <xref ref-type="bibr" rid="ref19 ref24 ref26 ref27">(Mikolov et al., 2013; Tripodi and
Pira, 2017)</xref>
        . The equation that gives us the
probability of selecting a word as negative is:
      </p>
      <p>P (wi) =</p>
      <p>f (wi)3=4
Pn
j=0 f (wj )3=4
;
(4)
where f (wi) is the frequency of word wi. Since
the similarity with negative players is 0 we
introduced the parameter to weight their influence and
set it to (A &gt; 0). The number of negative players,
negi, is set to ni (number of neighbours of player
i ).</p>
      <p>
        Once the players have played all the games with
their neighbors and negative players, the average
payoff of each player can be calculated with
Equation (2). The payoff is higher when two words are
highly correlated and have a similar mixed
strategy. For this reason the replicator dynamics
equation
        <xref ref-type="bibr" rid="ref30">(Weibull, 1997)</xref>
        is used to compute the
dynamics of the system. It pushes the players to be
influenced by the mixed strategy of the co-players.
This influence is proportional to the similarity
between two players (Aij ). Once the influence
dynamics do not affect the players the Nash
equilibrium of the system is reached. The stopping
criteria of the dynamics and are: 1. the maximum
number of iterations (105); and 2. the minimum
difference between two different iterations (10 3)
that is calculated as Pin=1 xi(t 1) xi(t).
4
      </p>
    </sec>
    <sec id="sec-5">
      <title>Experimental Results</title>
      <p>In this section, we evaluate TMG and compare it
with state-of-the-art systems.
4.1</p>
      <sec id="sec-5-1">
        <title>Data and Setting</title>
        <p>
          The datasets used to evaluate TMG are 20
Newsgroups1 (20NG) and NIPS2. 20NG is a collection
of about 20; 000 documents organized into 20
different classes. NIPS is composed of about 1; 700
NIPS conference papers published between 1987
and 1999 with no class information. Each text was
tokenized and lowercased. The stop-words were
removed and the vocabulary was constructed
considering the 1000 and 2000 most frequent words
in 20NG and NIPS, respectively. This choice is in
line with previous work
          <xref ref-type="bibr" rid="ref10 ref5">(Card et al., 2018)</xref>
          . To
keep the model as simple as possible, the tf-idf
weighting was used to construct the feature
vectors of the words and the cosine similarity was
employed to create the adjacency matrix A. It is
important to notice here that other sources of
information can be easily included at this stage,
derived from pre-trained word embeddings, syntactic
structures or document metadata. Then A is
sparsified taking only the r nearest neighbours of each
node. r is calculated as r = log(n) this operation
reduces the computational cost of the algorithm
and guarantees that the graph remains connected
          <xref ref-type="bibr" rid="ref29">(Von Luxburg, 2007)</xref>
          .
        </p>
        <p>1http://qwone.com/ jason/20Newsgroups/
2http://www.cs.nyu.edu/ roweis/data.html
Dataset TMG SCHOLAR NVDM LDA
20NG 824 819 927 791</p>
        <p>NIPS 1311 1370 1564 1017</p>
        <p>The strategy space of the players was initialized
using a normal distribution to reduce the
parameters of the framework3. The last two parameters
of the systems concern the stopping criteria of the
dynamics and are: 1. the maximum number of
iterations (105); and 2. the minimum difference
between two different iterations (10 3) that is
calculated as Pin=1 xi(t 1) xi(t).</p>
        <p>
          TMG has been compared with SCHOLAR4,
LDA5 and NVDM6. We configured the
NVDM network with two encoder layers
(500-dimensional) and ReLu non-linearities.
SCHOLAR has been configured using a more
complex setting that consists in a single layer
encoder and a 4-layer generator. LDA has been
run with the following parameters: = 50,
iterations = 1000 and topicthreshold = 0.
In this section, we compared the generalization
performances of TMG and compared them with
the models presented in the previous section. For
the evaluation we used perplexity (PPL), even if
it is has been shown to not correlate with human
interpretation of topics
          <xref ref-type="bibr" rid="ref6 ref7">(Chang et al., 2009)</xref>
          . We
computed perplexity on unobserved documents
(C), as.
        </p>
        <p>P P L(C) = exp(
1 PN</p>
        <p>n=1 logP (Cn) ) (5)</p>
        <p>N PnN=1 Dn
where N is the number of documents in the
collection C. Low perplexity suggests less uncertainties
about the documents. Held out documents
represent the 15% of each dataset. Perplexity is
computed for 10 topics for the NIPS dataset and 20
topics for the 20 Newsgroups dataset. These
numbers correspond to the real number of classes of
each dataset.</p>
        <p>
          Table 1 shows the comparison of perplexity. As
reported in previous work
          <xref ref-type="bibr" rid="ref10 ref5">(Card et al., 2018)</xref>
          , it is
3Experimentally it was also observed that using a
Dirichlet distribution to initialize the strategy space with different
parameters did not affect much the performances of the
model.
        </p>
        <p>4https://github.com/dallascard/scholar
5http://mallet.cs.umass.edu
6https://github.com/ysmiao/nvdm
difficult to achieve a lower perplexity than LDA.
The results in these experiments follow the same
pattern, with LDA that has the lowest perplexity,
TMG, and SCHOLAR that have similar results,
and NVDM that performs slightly worse on both
datasets.</p>
        <p>
          (a) 20NG
(b) NIPS
It has been shown that perplexity does not
necessarily correlate well with topic coherence
          <xref ref-type="bibr" rid="ref24 ref26 ref27 ref6 ref7">(Chang
et al., 2009; Srivastava and Sutton, 2017)</xref>
          . For this
reason, we evaluated the performances of our
system also on coherence
          <xref ref-type="bibr" rid="ref4 ref6 ref7 ref8">(Chang et al., 2009; Das et
al., 2015)</xref>
          . The coherence is calculated by
computing the relatedness between topic words using
the pointwise mutual information (PMI). We used
Wikipedia (2018.05.01 dump) as corpus to
compute co-occurrence statistics using a sliding
window of 5 words on the left and on the right of
each target word. For each topic, we selected the
10 words with the highest mass. Then we
calculated the PMI among all the words pair and finally
compute the coherence as the arithmetic mean of
all these values. This metric has been shown to
correlate well with human judgments
          <xref ref-type="bibr" rid="ref17">(Lau et al.,
2017)</xref>
          . We used two different sources of
information for the computation of the PMI: one is
internal and corresponds to the dataset under analysis;
the other one is external and is represented by the
English Wikipedia corpus.
        </p>
        <p>Internal PMI Figure 1 presents the PMI
values of the different models computed on the two
corpora. As it is possible to see from figure 1a,
TMG has a low PMI compared to all other
systems on the 20 Newsgroups dataset when there are
few topics to extract (i.e.: 2 and 5). The situation
changes drastically when the number of topics
increases. In fact, it has the highest performances on
this dataset when extracts 10, 20, 50, 100 topics.
The performances of NDVM and SCHOLAR are
similar and follow a decreasing pattern, with very
high values at the beginning. On the contrary, the
performances of LDA follow an opposite pattern
this model seems to work better when the
number of topics to extract is high. On NIPS (Figure
1b) the performances of the systems are similar to
those on 20 Newsgroups. The only exception is
that TMG has always the highest PMI and seems
to behave better also when the number of topics to
extract is high. This probably because the number
of words in NIPS is higher and for this, it is
reasonable to have also a higher number of topics. This
is also confirmed from a qualitative analysis of the
topics in Section 4.4, where it is demonstrated that
with low values of k it is possible to extract
general topics and increasing its value it is possible to
extract more specific ones.</p>
        <p>In general, we can find three different patterns
in these experiments: 1. NDVM and SCHOLAR
work well on extracting a low number of topics;
2. LDA works well when it has to extract a large
number of topics; 3. TMG works well on
extracting a number of topics that is close to the real
number of classes in the datasets. Another aspect to
take into account is the fact that even if TMG has
the highest performances, its results have also a
high standard deviation. This is due to the
stochastic nature of negative sampling.
turks schneider
soviet allan
turkish morality
armenian keith
armenia atheists
passes moral
roads political
armenians pasadena</p>
        <p>argic objective
proceeded animals
29:71 15:27
Sparsity We compared the sparsity of the
wordtopics matrices, X , in Figure 3a and 3b, computed
as s = jX&gt;10 3j . From both figures, we can see
jXj
that TMG can produce highly sparse
representations especially when the number of topics to
extract is low. This is a nice feature since it provides
more interpretable results. Only SCHOLAR
produces more sparse representations when the
number of topics to extract is high. Experimentally we
also noticed that we can control the sparsity of X ,
in TMG, increasing the number of iterations of the
game dynamics.
4.4</p>
      </sec>
      <sec id="sec-5-2">
        <title>Qualitative Evaluation</title>
        <p>Examples of topics extracted from 20NG and
NIPS are presented in Table 2 and 3, respectively7.
The first difference that emerges from these results
are the external PMI values. This is due to the fact
that the texts in NIPS have a very specific
language and for this reason the PMI values are very
high. We can also see that TMG groups highly
coherent set of words in each topic. We can easily
identify in Table 2 the topics in which the dataset
is organized and especially: talk.politics.midleast,
alt.atheism, comp.graphics, soc.religion.christian,
talk.politics.misc, rec.motorcycles, sci.crypt,
talk.politics.guns, rec.sport.hockey, sci.space,
talk.politics.misc.</p>
        <p>7for space limitation we presented only 15 topics for
20NG</p>
        <p>We can also easily identify from Table 3 highly
coherent topics, related to optic, signal analysis,
optimization, crowdsourcing, audio, graph theory
and logics. We noticed from these topics that they
are general and that it is possible to discover more
specific topics increasing the number of topics to
extract. For example, we discovered topics related
to topic modelling and generative adversarial
networks.
5</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and Future Work</title>
      <p>In this paper, it is presented a new topic
modeling framework based on game-theoretic
principles. The results of its evaluation show that the
model performs well compared to state-of-the-art
systems and that it can extract topically and
semantically related groups of words. In this work,
the model was left as simple as possible to assess
if a game-theoretic framework itself is suited for
topic modeling. In future work, it will be
interesting to introduce the topic-document distribution
and to test it on classification tasks and covariates
to extract topics using different dimensions, such
as time, authorship, or opinion. The framework
is open and flexible and in future work, it will be
tested with different initializations of the strategy
space, graph structures, and payoff functions. It
will be particularly interesting to test it using word
embedding and syntactic information.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Bengio et al.2003]
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          , Re´jean Ducharme, Pascal Vincent, and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Jauvin</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>A neural probabilistic language model</article-title>
          .
          <source>Journal of machine learning research</source>
          ,
          <volume>3</volume>
          (Feb):
          <fpage>1137</fpage>
          -
          <lpage>1155</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Blei et al.2003
          <string-name>
            <surname>] David</surname>
            <given-names>M Blei</given-names>
          </string-name>
          , Andrew Y Ng, and
          <string-name>
            <given-names>Michael I</given-names>
            <surname>Jordan</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>Journal of machine Learning research</source>
          ,
          <volume>3</volume>
          (Jan):
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>[Blei2012] David</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Blei</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Probabilistic topic models</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>55</volume>
          (
          <issue>4</issue>
          ):
          <fpage>77</fpage>
          -
          <lpage>84</lpage>
          , April.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Cao et al.2015]
          <string-name>
            <given-names>Ziqiang</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Sujian Li</given-names>
            ,
            <surname>Yang</surname>
          </string-name>
          <string-name>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Wenjie</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Heng</given-names>
            <surname>Ji</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>A novel neural topic model and its supervised extension</article-title>
          .
          <source>In AAAI</source>
          , pages
          <fpage>2210</fpage>
          -
          <lpage>2216</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Card et al.
          <year>2018</year>
          ] Dallas Card,
          <source>Chenhao Tan, and Noah A Smith</source>
          .
          <year>2018</year>
          .
          <article-title>Neural models for documents with metadata</article-title>
          .
          <source>In Proceedings of the 56th Annual Meeting of the ACL</source>
          , volume
          <volume>1</volume>
          , pages
          <fpage>2031</fpage>
          -
          <lpage>2040</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Chang et al.2009]
          <string-name>
            <given-names>Jonathan</given-names>
            <surname>Chang</surname>
          </string-name>
          , Sean Gerrish, Chong Wang,
          <string-name>
            <surname>Jordan L Boyd-Graber</surname>
          </string-name>
          , and David M Blei.
          <year>2009</year>
          .
          <article-title>Reading tea leaves: How humans interpret topic models</article-title>
          .
          <source>In NIPS</source>
          , pages
          <fpage>288</fpage>
          -
          <lpage>296</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Chong et al.2009]
          <string-name>
            <given-names>Wang</given-names>
            <surname>Chong</surname>
          </string-name>
          , David Blei, and
          <string-name>
            <given-names>FeiFei</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Simultaneous image classification and annotation</article-title>
          .
          <source>In CVPR</source>
          ,
          <year>2009</year>
          .
          <article-title>CVPR 2009</article-title>
          . IEEE Conference on, pages
          <fpage>1903</fpage>
          -
          <lpage>1910</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>[Das</surname>
          </string-name>
          et al.2015
          <string-name>
            <surname>] Rajarshi Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>Manzil Zaheer</surname>
            , and
            <given-names>Chris</given-names>
          </string-name>
          <string-name>
            <surname>Dyer</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Gaussian lda for topic models with word embeddings</article-title>
          .
          <source>In Proceedings of the 53rd Annual Meeting of the ACL</source>
          , volume
          <volume>1</volume>
          , pages
          <fpage>795</fpage>
          -
          <lpage>804</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Eisenstein et al.2011]
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Eisenstein</surname>
          </string-name>
          , Amr Ahmed, and Eric P Xing.
          <year>2011</year>
          .
          <article-title>Sparse additive generative models of text.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Gerlach et al.2018]
          <string-name>
            <given-names>Martin</given-names>
            <surname>Gerlach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Tiago P.</given-names>
            <surname>Peixoto</surname>
          </string-name>
          , and
          <string-name>
            <surname>Eduardo</surname>
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Altmann</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A network approach to topic models</article-title>
          .
          <source>Science Advances</source>
          ,
          <volume>4</volume>
          (
          <issue>7</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Goodfellow et al.2014]
          <string-name>
            <given-names>Ian</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jean</surname>
            <given-names>PougetAbadie</given-names>
          </string-name>
          , Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Generative adversarial nets</article-title>
          .
          <source>In NIPS</source>
          , pages
          <fpage>2672</fpage>
          -
          <lpage>2680</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Griffiths and Steyvers2004] Thomas L Griffiths and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Steyvers</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Finding scientific topics</article-title>
          .
          <source>Proceedings of the National academy of Sciences</source>
          ,
          <volume>101</volume>
          (
          <issue>suppl 1</issue>
          ):
          <fpage>5228</fpage>
          -
          <lpage>5235</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Hinton and Salakhutdinov2009]
          <string-name>
            <surname>Geoffrey</surname>
            <given-names>E</given-names>
          </string-name>
          <string-name>
            <surname>Hinton and Ruslan R Salakhutdinov</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Replicated softmax: an undirected topic model</article-title>
          .
          <source>In NIPS</source>
          , pages
          <fpage>1607</fpage>
          -
          <lpage>1614</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Hofmann1999]
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Hofmann</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>Probabilistic latent semantic indexing</article-title>
          .
          <source>In Proceedings of the 22nd annual international ACM SIGIR conference</source>
          , pages
          <fpage>50</fpage>
          -
          <lpage>57</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Lafferty and Blei2006]
          <string-name>
            <surname>John D Lafferty and David M Blei</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Correlated topic models</article-title>
          .
          <source>In NIPS</source>
          , pages
          <fpage>147</fpage>
          -
          <lpage>154</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>[Larochelle and Lauly2012] Hugo Larochelle and Stanislas Lauly</source>
          .
          <year>2012</year>
          .
          <article-title>A neural autoregressive topic model</article-title>
          .
          <source>In NIPS</source>
          , pages
          <fpage>2708</fpage>
          -
          <lpage>2716</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Lau et al.2017] Jey Han Lau,
          <string-name>
            <given-names>Timothy</given-names>
            <surname>Baldwin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Trevor</given-names>
            <surname>Cohn</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Topically driven neural language model</article-title>
          .
          <source>In Proceedings of the 55th Annual Meeting of the ACL</source>
          , volume
          <volume>1</volume>
          , pages
          <fpage>355</fpage>
          -
          <lpage>365</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>[</given-names>
            <surname>Mcauliffe</surname>
          </string-name>
          and
          <string-name>
            <given-names>Blei2008</given-names>
            ]
            <surname>Jon</surname>
          </string-name>
          <string-name>
            <given-names>D</given-names>
            <surname>Mcauliffe</surname>
          </string-name>
          and
          <string-name>
            <given-names>David M</given-names>
            <surname>Blei</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Supervised topic models</article-title>
          .
          <source>In NIPS</source>
          , pages
          <fpage>121</fpage>
          -
          <lpage>128</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Mikolov et al.2013]
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Kai Chen, Greg Corrado, and
          <string-name>
            <given-names>Jeffrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Efficient estimation of word representations in vector space</article-title>
          .
          <source>CoRR, abs/1301</source>
          .3781.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [Nash1951
          <string-name>
            <given-names>] John</given-names>
            <surname>Nash</surname>
          </string-name>
          .
          <year>1951</year>
          .
          <article-title>Non-cooperative games</article-title>
          .
          <source>Annals of mathematics</source>
          , pages
          <fpage>286</fpage>
          -
          <lpage>295</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>[Pavan and Pelillo2007] Massimiliano Pavan and Marcello Pelillo</source>
          .
          <year>2007</year>
          .
          <article-title>Dominant sets and pairwise clustering</article-title>
          .
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          ,
          <volume>29</volume>
          (
          <issue>1</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [Ranzato and Szummer2008]
          <string-name>
            <surname>Marc'Aurelio Ranzato</surname>
            and
            <given-names>Martin</given-names>
          </string-name>
          <string-name>
            <surname>Szummer</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Semi-supervised learning of compact document representations with deep networks</article-title>
          .
          <source>In Proceedings of the 25th international conference on Machine learning</source>
          , pages
          <fpage>792</fpage>
          -
          <lpage>799</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [Roberts et al.2014
          <string-name>
            <surname>] Margaret</surname>
            <given-names>E Roberts</given-names>
          </string-name>
          , Brandon M Stewart,
          <string-name>
            <given-names>Dustin</given-names>
            <surname>Tingley</surname>
          </string-name>
          , Christopher Lucas, Jetson Leder-Luis, Shana Kushner Gadarian, Bethany Albertson, and David G Rand.
          <year>2014</year>
          .
          <article-title>Structural topic models for open-ended survey responses</article-title>
          .
          <source>American Journal of Political Science</source>
          ,
          <volume>58</volume>
          (
          <issue>4</issue>
          ):
          <fpage>1064</fpage>
          -
          <lpage>1082</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <article-title>[Srivastava and Sutton2017] Akash Srivastava</article-title>
          and
          <string-name>
            <given-names>Charles</given-names>
            <surname>Sutton</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Autoencoding variational inference for topic models</article-title>
          .
          <source>In International Conference on Learning Representations (ICLR).</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>[Tripodi and Navigli2019] Rocco Tripodi and Roberto Navigli</source>
          .
          <year>2019</year>
          .
          <article-title>Game theory meets embeddings: a unified framework for word sense disambiguation</article-title>
          .
          <source>In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          , pages
          <fpage>88</fpage>
          -
          <lpage>99</lpage>
          ,
          <string-name>
            <surname>Hong</surname>
            <given-names>Kong</given-names>
          </string-name>
          , China, November. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>[Tripodi and Pelillo2017] Rocco Tripodi and Marcello Pelillo</source>
          .
          <year>2017</year>
          .
          <article-title>A game-theoretic approach to word sense disambiguation</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>43</volume>
          (
          <issue>1</issue>
          ):
          <fpage>31</fpage>
          -
          <lpage>70</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <source>[Tripodi and Pira2017] Rocco Tripodi and Stefano Li Pira</source>
          .
          <year>2017</year>
          .
          <article-title>Analysis of italian word embeddings</article-title>
          .
          <source>In Proceedings of the Fourth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2017</year>
          ), Rome, Italy,
          <source>December 11-13</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [Tripodi et al.2016]
          <string-name>
            <given-names>Rocco</given-names>
            <surname>Tripodi</surname>
          </string-name>
          , Sebastiano Vascon, and
          <string-name>
            <given-names>Marcello</given-names>
            <surname>Pelillo</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Context aware nonnegative matrix factorization clustering</article-title>
          .
          <source>In 23rd International Conference on Pattern Recognition, ICPR</source>
          <year>2016</year>
          ,
          <article-title>Canc u´n,</article-title>
          <string-name>
            <surname>Mexico</surname>
          </string-name>
          , December 4-
          <issue>8</issue>
          ,
          <year>2016</year>
          , pages
          <fpage>1719</fpage>
          -
          <lpage>1724</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <source>[Von Luxburg2007] Ulrike Von Luxburg</source>
          .
          <year>2007</year>
          .
          <article-title>A tutorial on spectral clustering</article-title>
          .
          <source>Statistics and computing</source>
          ,
          <volume>17</volume>
          (
          <issue>4</issue>
          ):
          <fpage>395</fpage>
          -
          <lpage>416</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [Weibull1997]
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Weibull</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Evolutionary game theory</article-title>
          . MIT press.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>