=Paper=
{{Paper
|id=Vol-2769/76
|storemode=property
|title=Topic Modelling Games
|pdfUrl=https://ceur-ws.org/Vol-2769/paper_76.pdf
|volume=Vol-2769
|authors=Rocco Tripodi
|dblpUrl=https://dblp.org/rec/conf/clic-it/Tripodi20
}}
==Topic Modelling Games==
Topic Modelling Games
Rocco Tripodi
Sapienza NLP Group
Department of Computer Science, Sapienza University of Rome
tripodi@di.uniroma1.it
Abstract uments (Blei, 2012; Griffiths and Steyvers, 2004).
English. This paper presents a new topic It can be used in different tasks of text classifica-
modelling framework inspired by game tion, document retrieval, and sentiment analysis,
theoretic principles. It is formulated as providing together vector representations of words
a normal form game in which words are and documents. State-of-the-art systems are based
represented as players and topics as strate- on probabilistic (Blei et al., 2003; Mcauliffe and
gies that the players select. The strate- Blei, 2008; Chong et al., 2009) and neural net-
gies of each player are modelled with a works models (Bengio et al., 2003; Hinton and
probability distribution guided by a util- Salakhutdinov, 2009; Larochelle and Lauly, 2012;
ity function that the players try to max- Cao et al., 2015). A different perspective based on
imize. This function induces players to game theory is proposed in this article.
select strategies similar to those selected The use of game-theoretic principles in machine
by similar players and to choice strate- learning (Goodfellow et al., 2014), pattern recog-
gies not shared with those selected by dis- nition (Pavan and Pelillo, 2007) and natural lan-
similar players. The proposed framework guage processing (Tripodi et al., 2016; Tripodi and
is compared with state-of-the-art models Navigli, 2019) problems is developing a promis-
demonstrating good performances on stan- ing field of research with the development of orig-
dard benchmarks. inal models. The main difference between compu-
tational models based on optimization techniques
Italiano. Questo articolo presenta un ap-
and game-theoretic models is that the former tries
proccio di modellazione dei topic ispirato
to maximize (minimize) a function (that in many
alla teoria dei giochi. La modellazione dei
cases is non-convex) and the latter tries to find
topic è vista come un gioco in forma nor-
the equilibrium state of a dynamical system. The
male in cui le parole rappresentano i gio-
equilibrium concept is useful because it represents
catori e i topic le strategie che i giocatori
a state in which all the constraints of a given sys-
possono scegliere. Ogni giocatore sceglie
tem are satisfied and no object of the system has
le strategie da impiegare tramite una dis-
an incentive to deviate from it, because a differ-
tribuzione di probabilità che viene influen-
ent configuration will immediately lead to a worse
zata da una funzione di utilità che i gio-
situation in terms of payoff and fitness, at object
catori cercano di massimizzare. Questa
and system level. Furthermore, it is guaranteed
funzione incentiva i giocatori a scegliere
that the system converges to a mixed strategy Nash
strategie simili a quelle impiegate da gio-
equilibrium (Nash, 1951). So far, game-theoretic
catori simili e disincentiva la scelta di
models have been used in classification and clus-
strategie condivise con giocatori dissim-
tering tasks (Pavan and Pelillo, 2007; Tripodi and
ili. Il confronto con modelli allo stato
Pelillo, 2017). In this work, it is proposed a game-
dell’arte dismostra buone prestazioni su
theoretic model for inferring a low dimensional
diversi dataset di valutazione.
representation of words that can capture their la-
tent semantic representation.
1 Introduction
In this work, topic modeling is interpreted as a
Topic modeling is a technique that discovers the symmetric non-cooperative game (Weibull, 1997)
underlying topics contained in a collection of doc- in which, the words are the players and the topics
are the strategies that the players can select. Two and Lauly, 2012) have been used to model docu-
players are matched to play the games together ac- ments with layer-wise neural network tools. Neu-
cording to the co-occurrence patterns found in the ral Topic Model (NTM; (Cao et al., 2015)) tries to
corpus under study. The players use a probability overcome some limitations of classical topic mod-
distribution over their strategies to play the games els, such as the initialization problem and the gen-
and obtain a payoff for each strategy. This reward eralization to n-grams. It exploits word embed-
helps them to adjust their strategy selection in fu- ding to represent n-grams and uses backpropaga-
ture games, considering what strategy has been ef- tion to adjust the weights of the network between
fective in previous games. It allows concentrating the embedding and the word-topic and document-
more mass on the strategies that get high reward. topic layers. A general framework for topic mod-
The underlying idea to model the payoff function eling based also on neural networks is Sparse Con-
is to create two influence dynamics, the first one textual Hidden and Observed Language Autoen-
forces similar players (words that appear in sim- codeR (SCHOLAR; (Card et al., 2018)). It allows
ilar contexts) to select similar strategies; the sec- using covariates to influence the topic distributions
ond one forces dissimilar players (words that do and labels to include supervision. As Sparse Addi-
not share any context) to select different strategies. tive GEnerative models (SAGE; (Eisenstein et al.,
The games are played repeatedly until the system 2011))it can produce sparse topic representations
converges, that is, the difference among the strat- but differently from it and Structural Topic Model
egy distributions of the players at time t and at (STM; (Roberts et al., 2014)) it can easily consider
time t − 1 is under a small threshold. The conver- a larger set of metadata. A graphical topic model
gence of the system corresponds to an equilibrium, was proposed by Gerlach et al. (2018). In this
a situation in which there is an optimal association framework, the task of finding topical structures
of words and topics. is interpreted as the task of finding communities
in complex networks. It is particularly interesting
2 Related Work because it shows analogies with traditional topic
models and overcomes some of their limitations
Hofmann (1999) proposed one of the earliest topic such as the bound with a Bayesian prior and the
models, probabilistic Latent Semantic Indexing need to specify the number of topics in advance.
(pLSI). It represents each word in a document
as a sample from a mixture model, where top- 3 Topic Modelling Games
ics are represented as multinomial random vari-
ables and documents as a mixture of topics. La- Normal-form games consist of a finite set of play-
tent Dirichlet Allocation (LDA) (Blei et al., 2003), ers N = (1, .., n), a finite set of pure strategies,
the most widely used topic model, is a general- Si = {1, ..., mi } for each player i ∈ N and a
ization of pLSI that introduces Dirichlet priors for payoff (utility) function ui : S → R, that asso-
both the word multinomial distributions over top- ciates a payoff to each combination of strategies
ics and topic multinomial distributions over docu- S = S1 × S2 × ... × Sn . The payoff function does
ments. This line of research has been developed not depend only on the strategy chosen by a single
building on top of LDA different features to in- player but by the combination of strategies played
fer correlations among topics (Lafferty and Blei, at the same time by the players. Each player tries
2006) or to model jointly words and labels in a su- to maximize the value of ui . Furthermore, in non-
pervised way (Mcauliffe and Blei, 2008). cooperative games the players choose their strate-
Topic models based on neural network princi- gies independently, considering what other play-
ples have been introduced with the neural net- ers can play and trying to find the best response
work language model proposed in (Bengio et al., to the strategy of the co-players. Nash equilibria
2003). This paradigm is very popular in NLP and (Nash, 1951) represent the key concept of game
many topic models are based on it because with theory and can be defined as those strategy com-
these techniques it is possible to obtain a low- binations in which each strategy is a best response
dimensional representation of the data. In particu- to the strategy of the co-player and no player has
lar, auto-encoders (Ranzato and Szummer, 2008), the incentive to unilaterally deviate from them be-
Boltzmann machines (Hinton and Salakhutdinov, cause there is no way to do better. In addition
2009) and autoregressive distributions (Larochelle to play pure strategies, that correspond to select-
ing just one strategy from those available in Si , the n × n adjacency matrix (W ) of an undirected
a player i can also use mixed strategies, which weighted graph. Each entry wij encodes the sim-
are probability distributions over pure strategies. ilarity between two words. The strategy space of
A mixed strategy over Si is defined as a vec- the games can be represented as a n × m matrix
P xi = (x1 , . . . , xmi ), such that xj ≥ 0 and
tor X, where each row represents the probability dis-
xj = 1. In a two-player game, a strategy pro- tribution of a player over its m strategies (topics
file can be defined as a pair (xi , xj ). The expected that have to be extracted from the corpus).
payoff for this strategy profile is computed as:
Payoff Function and System Dynamics The
u(xi , xj ) = xTi · Aij xj payoff function of the game is constructed ex-
ploiting the information stored in W . This ma-
where Aij is the mi × mj payoff matrix between trix gives us the structural information of the cor-
player i and j. pus. It allows us to select the players with whom
Evolutionary game theory (Weibull, 1997) has each player is playing the games, indicated with
introduced two important modifications: 1. the the presence of an edge between two nodes (play-
games are played repeatedly, and 2. the players ers), and to quantify the level of influence that each
update their mixed strategy over time until it is not player has on the other, indicated with the weight
possible to improve the payoff. The players, with on each edge. The absence of an edge in this graph
these two modifications, can develop an inductive indicates that two words are distributional dissim-
learning process, that allows them to learn their ilar. Using these three sources of information we
strategy distribution according to what other play- model a payoff function that forces similar players
ers are selecting. The payoff corresponding to the to choose similar strategies (topics) and dissimilar
h-th pure strategy is computed as: players to choose different ones. The payoff of a
ni
player is calculated as,
X
u(xhi ) = xhi · (Aij xj )h (1) ni
X neg
Xi
j=1 u(xhi ) = xhi ( (Aij xj )h − (xg )h ) (3)
j=1 g=1
The average payoff of player i is calculated as:
where the first summation is over all the ni di-
mi
X rect neighbors of player i that are the players with
u(xi ) = u(xhi ) (2)
whom i share some similarity and the second sum-
h=1
mation is over the negi negative players of player
To find the Nash equilibrium of the game, it is i, that are players with whom player i does not
common to use the replicator dynamics equation share any similarity. With the first summation
(Weibull, 1997). It allows better than average player i will negotiate with its neighbors a corre-
strategies to grow at each iteration. It can be con- lated strategy (topic), with the second he will devi-
sidered as an inductive learning process, in which ate from the strategies chosen by negative players,
the players learn from past experiences how to this is done by subtracting the payoff that i would
play their best strategy. It is important to notice have gained if these negative players would have
that each player optimizes its individual strategy been his neighbors. The negative players are sam-
space, but this operation is done according to what pled from V according to frequency, in the same
other players simultaneously are doing so the local way, negative samples are selected in word embed-
optimization is the result of a global process. dings models (Mikolov et al., 2013; Tripodi and
Pira, 2017). The equation that gives us the proba-
Data Preparation The players of the topic mod-
bility of selecting a word as negative is:
elling games are the words v = (1, . . . , n) in the
vocabulary V of the corpus under analysis and the f (wi )3/4
strategies S = (1, . . . , m) are the topics to extract P (wi ) = Pn 3/4
, (4)
j=0 f (wj )
from the same corpus. The strategy space xi of
each player i is represented as a probability dis- where f (wi ) is the frequency of word wi . Since
tribution that can be interpreted as the mixture of the similarity with negative players is 0 we intro-
topics typically used in topic modeling. The in- duced the parameter to weight their influence and
teractions among the players are modeled using set it to (A > 0). The number of negative players,
negi , is set to ni (number of neighbours of player Dataset TMG SCHOLAR NVDM LDA
i ). 20NG 824 819 927 791
NIPS 1311 1370 1564 1017
Once the players have played all the games with
their neighbors and negative players, the average Table 1: Comparison of the models as perplexity.
payoff of each player can be calculated with Equa-
tion (2). The payoff is higher when two words are
highly correlated and have a similar mixed strat- The strategy space of the players was initialized
egy. For this reason the replicator dynamics equa- using a normal distribution to reduce the parame-
tion (Weibull, 1997) is used to compute the dy- ters of the framework3 . The last two parameters
namics of the system. It pushes the players to be of the systems concern the stopping criteria of the
influenced by the mixed strategy of the co-players. dynamics and are: 1. the maximum number of it-
This influence is proportional to the similarity be- erations (105 ); and 2. the minimum difference be-
tween two −3
tween two players (Aij ). Once the influence dy- Pndifferent iterations (10 ) that is calcu-
namics do not affect the players the Nash equilib- lated as i=1 xi (t − 1) − xi (t).
rium of the system is reached. The stopping cri- TMG has been compared with SCHOLAR4 ,
teria of the dynamics and are: 1. the maximum LDA5 and NVDM6 . We configured the
number of iterations (105 ); and 2. the minimum NVDM network with two encoder layers
difference between two −3 (500-dimensional) and ReLu non-linearities.
Pn different iterations (10 ) SCHOLAR has been configured using a more
that is calculated as i=1 xi (t − 1) − xi (t).
complex setting that consists in a single layer
4 Experimental Results encoder and a 4-layer generator. LDA has been
run with the following parameters: α = 50,
In this section, we evaluate TMG and compare it iterations = 1000 and topicthreshold = 0.
with state-of-the-art systems.
4.2 Evaluation
4.1 Data and Setting
In this section, we compared the generalization
The datasets used to evaluate TMG are 20 News- performances of TMG and compared them with
groups1 (20NG) and NIPS2 . 20NG is a collection the models presented in the previous section. For
of about 20, 000 documents organized into 20 dif- the evaluation we used perplexity (PPL), even if
ferent classes. NIPS is composed of about 1, 700 it is has been shown to not correlate with human
NIPS conference papers published between 1987 interpretation of topics (Chang et al., 2009). We
and 1999 with no class information. Each text was computed perplexity on unobserved documents
tokenized and lowercased. The stop-words were (C), as.
removed and the vocabulary was constructed con-
1 N
P
sidering the 1000 and 2000 most frequent words n=1 logP (Cn )
P P L(C) = exp(− PN ) (5)
in 20NG and NIPS, respectively. This choice is in N n=1 Dn
line with previous work (Card et al., 2018). To where N is the number of documents in the collec-
keep the model as simple as possible, the tf-idf tion C. Low perplexity suggests less uncertainties
weighting was used to construct the feature vec- about the documents. Held out documents repre-
tors of the words and the cosine similarity was sent the 15% of each dataset. Perplexity is com-
employed to create the adjacency matrix A. It is puted for 10 topics for the NIPS dataset and 20
important to notice here that other sources of in- topics for the 20 Newsgroups dataset. These num-
formation can be easily included at this stage, de- bers correspond to the real number of classes of
rived from pre-trained word embeddings, syntactic each dataset.
structures or document metadata. Then A is spar- Table 1 shows the comparison of perplexity. As
sified taking only the r nearest neighbours of each reported in previous work (Card et al., 2018), it is
node. r is calculated as r = log(n) this operation
3
reduces the computational cost of the algorithm Experimentally it was also observed that using a Dirich-
let distribution to initialize the strategy space with different
and guarantees that the graph remains connected α parameters did not affect much the performances of the
(Von Luxburg, 2007). model.
4
https://github.com/dallascard/scholar
1 5
http://qwone.com/ jason/20Newsgroups/ http://mallet.cs.umass.edu
2 6
http://www.cs.nyu.edu/ roweis/data.html https://github.com/ysmiao/nvdm
difficult to achieve a lower perplexity than LDA. each target word. For each topic, we selected the
The results in these experiments follow the same 10 words with the highest mass. Then we calcu-
pattern, with LDA that has the lowest perplexity, lated the PMI among all the words pair and finally
TMG, and SCHOLAR that have similar results, compute the coherence as the arithmetic mean of
and NVDM that performs slightly worse on both all these values. This metric has been shown to
datasets. correlate well with human judgments (Lau et al.,
2017). We used two different sources of informa-
tion for the computation of the PMI: one is inter-
nal and corresponds to the dataset under analysis;
the other one is external and is represented by the
English Wikipedia corpus.
(a) 20NG (b) NIPS Internal PMI Figure 1 presents the PMI val-
ues of the different models computed on the two
Figure 1: Internal PMI mean and std values.
corpora. As it is possible to see from figure 1a,
TMG has a low PMI compared to all other sys-
tems on the 20 Newsgroups dataset when there are
few topics to extract (i.e.: 2 and 5). The situation
changes drastically when the number of topics in-
creases. In fact, it has the highest performances on
this dataset when extracts 10, 20, 50, 100 topics.
The performances of NDVM and SCHOLAR are
similar and follow a decreasing pattern, with very
(a) 20NG (b) NIPS high values at the beginning. On the contrary, the
performances of LDA follow an opposite pattern
Figure 2: External PMI mean and std values.
this model seems to work better when the num-
ber of topics to extract is high. On NIPS (Figure
1b) the performances of the systems are similar to
those on 20 Newsgroups. The only exception is
that TMG has always the highest PMI and seems
to behave better also when the number of topics to
extract is high. This probably because the number
of words in NIPS is higher and for this, it is reason-
able to have also a higher number of topics. This
(a) 20NG (b) NIPS is also confirmed from a qualitative analysis of the
topics in Section 4.4, where it is demonstrated that
Figure 3: Sparsity mean and std values.
with low values of k it is possible to extract gen-
eral topics and increasing its value it is possible to
4.3 Topic Coherence and Interpretability extract more specific ones.
It has been shown that perplexity does not neces- In general, we can find three different patterns
sarily correlate well with topic coherence (Chang in these experiments: 1. NDVM and SCHOLAR
et al., 2009; Srivastava and Sutton, 2017). For this work well on extracting a low number of topics;
reason, we evaluated the performances of our sys- 2. LDA works well when it has to extract a large
tem also on coherence (Chang et al., 2009; Das et number of topics; 3. TMG works well on extract-
al., 2015). The coherence is calculated by com- ing a number of topics that is close to the real num-
puting the relatedness between topic words using ber of classes in the datasets. Another aspect to
the pointwise mutual information (PMI). We used take into account is the fact that even if TMG has
Wikipedia (2018.05.01 dump) as corpus to com- the highest performances, its results have also a
pute co-occurrence statistics using a sliding win- high standard deviation. This is due to the stochas-
dow of 5 words on the left and on the right of tic nature of negative sampling.
turks schneider drive vms god intellect bike providing fbi gun team space male tim amateur
soviet allan ide disclaimer jesus banks ride encryption compound firearms game orbit gay israel georgia
turkish morality scsi vnews christians gordon riding clipper batf guns play shuttle men israeli intelligence
armenian keith controller vax christ surrender dod key fire criminals season launch sexual arab ai
armenia atheists drives necessarily christianity univ bikes escrow waco crime hockey earth percentage jews programs
passes moral mb represents bible pittsburgh motorcycle crypto children weapons league mission study arabs michael
roads political disk views christian significant bmw keys koresh criminal nhl flight sex policy radio
armenians pasadena isa expressed faith hospital honda chip gas violent players nasa apparent war adams
argic objective bus news church level road secure branch weapon cup moon showing land ignore
proceeded animals floppy poster belief blood advice wiretap started armed stanley solar women north occur
29.71 15.27 12.7 11.72 10.79 10.18 8.94 8.93 8.55 7.52 7.45 7.14 6.92 6.21 6.13
Table 2: Best topics (each topic is represented on the columns) extracted from 20 Newsgroup using TMG (setting k = 20)
ordered using external PMI (bottom row).
ocular dendrites oscillatory crowdsourcing kaiming retina auditory graph disturbances lifted
eye dendritic oscillations crowds shaoqing photoreceptor sound edges plant propositional
fovea soma oscillators workers xiangyu retinal sounds graphs controllers predicate
dominance dendrite oscillator worker jian vertebrate cochlear optimisation controller grounding
saccades axonal oscillation labelers yangqing schulten ear edge disturbance predicates
saccadic axons synchronization crowd karen photoreceptors hearing vertices plants domingos
fixation nmda decoding turk sergey ganglion ears optimise activate clauses
foveal pyramidal locking wisdom trevor kohonen acoust optimising activated compilation
eyes somatic synchronize expertise sergio bipolar tone optimised activating formulas
saccade axon synchronized dawid jitendra visualizing cochlea vertex activates logical
304.85 283.66 276.39 230.5 218.51 196.86 176.75 146.3 146.25 145.84
Table 3: Topics extracted from NIPS using TMG (setting k = 10) ordered using external PMI (bottom row).
Sparsity We compared the sparsity of the word- We can also easily identify from Table 3 highly
topics matrices, X, in Figure 3a and 3b, computed coherent topics, related to optic, signal analysis,
−3 |
as s = |X>10 |X| . From both figures, we can see optimization, crowdsourcing, audio, graph theory
that TMG can produce highly sparse representa- and logics. We noticed from these topics that they
tions especially when the number of topics to ex- are general and that it is possible to discover more
tract is low. This is a nice feature since it provides specific topics increasing the number of topics to
more interpretable results. Only SCHOLAR pro- extract. For example, we discovered topics related
duces more sparse representations when the num- to topic modelling and generative adversarial net-
ber of topics to extract is high. Experimentally we works.
also noticed that we can control the sparsity of X,
in TMG, increasing the number of iterations of the 5 Conclusion and Future Work
game dynamics.
In this paper, it is presented a new topic mod-
4.4 Qualitative Evaluation eling framework based on game-theoretic princi-
ples. The results of its evaluation show that the
Examples of topics extracted from 20NG and
model performs well compared to state-of-the-art
NIPS are presented in Table 2 and 3, respectively7 .
systems and that it can extract topically and se-
The first difference that emerges from these results
mantically related groups of words. In this work,
are the external PMI values. This is due to the fact
the model was left as simple as possible to assess
that the texts in NIPS have a very specific lan-
if a game-theoretic framework itself is suited for
guage and for this reason the PMI values are very
topic modeling. In future work, it will be inter-
high. We can also see that TMG groups highly
esting to introduce the topic-document distribution
coherent set of words in each topic. We can easily
and to test it on classification tasks and covariates
identify in Table 2 the topics in which the dataset
to extract topics using different dimensions, such
is organized and especially: talk.politics.midleast,
as time, authorship, or opinion. The framework
alt.atheism, comp.graphics, soc.religion.christian,
is open and flexible and in future work, it will be
talk.politics.misc, rec.motorcycles, sci.crypt,
tested with different initializations of the strategy
talk.politics.guns, rec.sport.hockey, sci.space,
space, graph structures, and payoff functions. It
talk.politics.misc.
will be particularly interesting to test it using word
7
for space limitation we presented only 15 topics for embedding and syntactic information.
20NG
References [Hofmann1999] Thomas Hofmann. 1999. Probabilis-
tic latent semantic indexing. In Proceedings of the
[Bengio et al.2003] Yoshua Bengio, Réjean Ducharme, 22nd annual international ACM SIGIR conference,
Pascal Vincent, and Christian Jauvin. 2003. A neu- pages 50–57. ACM.
ral probabilistic language model. Journal of ma-
chine learning research, 3(Feb):1137–1155. [Lafferty and Blei2006] John D Lafferty and David M
Blei. 2006. Correlated topic models. In NIPS,
[Blei et al.2003] David M Blei, Andrew Y Ng, and pages 147–154.
Michael I Jordan. 2003. Latent dirichlet allocation.
Journal of machine Learning research, 3(Jan):993– [Larochelle and Lauly2012] Hugo Larochelle and
1022. Stanislas Lauly. 2012. A neural autoregressive
topic model. In NIPS, pages 2708–2716.
[Blei2012] David M. Blei. 2012. Probabilistic topic
models. Commun. ACM, 55(4):77–84, April.
[Lau et al.2017] Jey Han Lau, Timothy Baldwin, and
Trevor Cohn. 2017. Topically driven neural lan-
[Cao et al.2015] Ziqiang Cao, Sujian Li, Yang Liu,
guage model. In Proceedings of the 55th Annual
Wenjie Li, and Heng Ji. 2015. A novel neural topic
Meeting of the ACL, volume 1, pages 355–365.
model and its supervised extension. In AAAI, pages
2210–2216.
[Mcauliffe and Blei2008] Jon D Mcauliffe and
David M Blei. 2008. Supervised topic mod-
[Card et al.2018] Dallas Card, Chenhao Tan, and
els. In NIPS, pages 121–128.
Noah A Smith. 2018. Neural models for documents
with metadata. In Proceedings of the 56th Annual
Meeting of the ACL, volume 1, pages 2031–2040. [Mikolov et al.2013] Tomas Mikolov, Kai Chen, Greg
Corrado, and Jeffrey Dean. 2013. Efficient estima-
[Chang et al.2009] Jonathan Chang, Sean Gerrish, tion of word representations in vector space. CoRR,
Chong Wang, Jordan L Boyd-Graber, and David M abs/1301.3781.
Blei. 2009. Reading tea leaves: How humans inter-
pret topic models. In NIPS, pages 288–296. [Nash1951] John Nash. 1951. Non-cooperative games.
Annals of mathematics, pages 286–295.
[Chong et al.2009] Wang Chong, David Blei, and Fei-
Fei Li. 2009. Simultaneous image classification and [Pavan and Pelillo2007] Massimiliano Pavan and Mar-
annotation. In CVPR, 2009. CVPR 2009. IEEE Con- cello Pelillo. 2007. Dominant sets and pairwise
ference on, pages 1903–1910. IEEE. clustering. IEEE transactions on pattern analysis
and machine intelligence, 29(1).
[Das et al.2015] Rajarshi Das, Manzil Zaheer, and
Chris Dyer. 2015. Gaussian lda for topic models [Ranzato and Szummer2008] Marc’Aurelio Ranzato
with word embeddings. In Proceedings of the 53rd and Martin Szummer. 2008. Semi-supervised
Annual Meeting of the ACL, volume 1, pages 795– learning of compact document representations
804. with deep networks. In Proceedings of the 25th
international conference on Machine learning,
[Eisenstein et al.2011] Jacob Eisenstein, Amr Ahmed, pages 792–799. ACM.
and Eric P Xing. 2011. Sparse additive generative
models of text. [Roberts et al.2014] Margaret E Roberts, Brandon M
Stewart, Dustin Tingley, Christopher Lucas, Jetson
[Gerlach et al.2018] Martin Gerlach, Tiago P. Peixoto, Leder-Luis, Shana Kushner Gadarian, Bethany Al-
and Eduardo G. Altmann. 2018. A network ap- bertson, and David G Rand. 2014. Structural topic
proach to topic models. Science Advances, 4(7). models for open-ended survey responses. American
Journal of Political Science, 58(4):1064–1082.
[Goodfellow et al.2014] Ian Goodfellow, Jean Pouget-
Abadie, Mehdi Mirza, Bing Xu, David Warde- [Srivastava and Sutton2017] Akash Srivastava and
Farley, Sherjil Ozair, Aaron Courville, and Yoshua Charles Sutton. 2017. Autoencoding variational
Bengio. 2014. Generative adversarial nets. In inference for topic models. In International
NIPS, pages 2672–2680. Conference on Learning Representations (ICLR).
[Griffiths and Steyvers2004] Thomas L Griffiths and [Tripodi and Navigli2019] Rocco Tripodi and Roberto
Mark Steyvers. 2004. Finding scientific topics. Navigli. 2019. Game theory meets embeddings: a
Proceedings of the National academy of Sciences, unified framework for word sense disambiguation.
101(suppl 1):5228–5235. In Proceedings of the 2019 Conference on Empir-
ical Methods in Natural Language Processing and
[Hinton and Salakhutdinov2009] Geoffrey E Hinton the 9th International Joint Conference on Natural
and Ruslan R Salakhutdinov. 2009. Replicated soft- Language Processing (EMNLP-IJCNLP), pages 88–
max: an undirected topic model. In NIPS, pages 99, Hong Kong, China, November. Association for
1607–1614. Computational Linguistics.
[Tripodi and Pelillo2017] Rocco Tripodi and Marcello
Pelillo. 2017. A game-theoretic approach to word
sense disambiguation. Computational Linguistics,
43(1):31–70.
[Tripodi and Pira2017] Rocco Tripodi and Stefano Li
Pira. 2017. Analysis of italian word embeddings.
In Proceedings of the Fourth Italian Conference on
Computational Linguistics (CLiC-it 2017), Rome,
Italy, December 11-13, 2017.
[Tripodi et al.2016] Rocco Tripodi, Sebastiano Vascon,
and Marcello Pelillo. 2016. Context aware nonneg-
ative matrix factorization clustering. In 23rd Inter-
national Conference on Pattern Recognition, ICPR
2016, Cancún, Mexico, December 4-8, 2016, pages
1719–1724.
[Von Luxburg2007] Ulrike Von Luxburg. 2007. A tuto-
rial on spectral clustering. Statistics and computing,
17(4):395–416.
[Weibull1997] J. W. Weibull. 1997. Evolutionary game
theory. MIT press.