=Paper= {{Paper |id=Vol-3033/paper55 |storemode=property |title=OCTIS 2.0: Optimizing and Comparing Topic Models in Italian Is Even Simpler! |pdfUrl=https://ceur-ws.org/Vol-3033/paper55.pdf |volume=Vol-3033 |authors=Silvia Terragni,Elisabetta Fersini |dblpUrl=https://dblp.org/rec/conf/clic-it/TerragniF21 }} ==OCTIS 2.0: Optimizing and Comparing Topic Models in Italian Is Even Simpler!== https://ceur-ws.org/Vol-3033/paper55.pdf
                             OCTIS 2.0:
    Optimizing and Comparing Topic Models in Italian Is Even Simpler!

                     Silvia Terragni and Elisabetta Fersini
                    University of Milano-Bicocca, Milan, Italy
      s.terragni4@campus.unimib.it, elisabetta.fersini@unimib.it



                                                                1   Introduction
                        Abstract
                                                                Topic models are statistical methods that aim to
    English. OCTIS is an open-source frame-                     extract the hidden topics underlying a collection
    work for training, evaluating and compar-                   of documents (Blei et al., 2003; Blei, 2012; Boyd-
    ing Topic Models. This tool uses single-                    Graber et al., 2017). Topics are often represented
    objective Bayesian Optimization (BO) to                     by sets of words that make sense together, e.g. the
    optimize the hyper-parameters of the mod-                   words “cat, animal, dog, mouse” may represent a
    els and thus guarantee a fairer compari-                    topic about animals. Topic models’ evaluations
    son. Yet, a single-objective approach dis-                  are usually limited to the comparison of models
    regards that a user may want to simulta-                    whose hyper-parameters are held fixed (Doan and
    neously optimize multiple objectives. We                    Hoang, 2021; Terragni et al., 2020a; Terragni et
    therefore propose OCTIS 2.0: the exten-                     al., 2020b). However, hyper-parameters can have
    sion of OCTIS that addresses the problem                    an impressive impact on the models’ performance
    of estimating the optimal hyper-parameter                   and therefore fixing the hyper-parameters prevents
    configurations for a topic model using                      the researchers from discovering the best topic
    multi-objective BO. Moreover, we also re-                   model on the selected dataset.
    lease and integrate two pre-processed Ital-                    Recently, OCTIS (Terragni et al., 2021a, Opti-
    ian datasets, which can be easily used as                   mizing and Comparing Topic Models is Simple)
    benchmarks for the Italian language.                        has been released: a comprehensive and open-
    Italiano. OCTIS è un framework open-                       source framework for training, analyzing, and
    source per il training, la valutazione                      comparing topic models, over several datasets and
    e la comparazione di Topic Models.                          evaluation metrics. OCTIS determines the opti-
    Questo strumento utilizza l’ottimizzazione                  mal hyper-parameter configuration according to
    Bayesiana (BO) a singolo obiettivo per                      a Bayesian Optimization (BO) strategy (Archetti
    ottimizzare gli iperparametri dei modelli                   and Candelieri, 2019; Snoek et al., 2012; Galuzzi
    e quindi garantire una comparazione più                    et al., 2020). The framework already provides sev-
    equa. Tuttavia, questo approccio ignora                     eral features and resources, among which at least
    che un utente potrebbe voler ottimizzare                    8 topic models, 4 categories of evaluation metrics,
    pi‘u di un obiettivo. Proponiamo perciò                    and 4 pre-processed datasets. However, the frame-
    OCTIS 2.0: l’estensione di OCTIS che af-                    work uses a single-objective Bayesian optimiza-
    fronta il problema della stima delle config-                tion approach, disregarding that a user may want
    urazioni ottimali degli iperparametri di un                 to simultaneously optimize more than one objec-
    topic model usando la BO multi-obiettivo.                   tive (Terragni and Fersini, 2021). For example, a
    In aggiunta, rilasciamo e integriamo an-                    user may be interested in obtaining topics that are
    che due nuovi dataset in italiano pre-                      coherent but also diverse and separated from each
    processati, che possono essere facilmente                   other.
    utilizzati come benchmark per la lingua
                                                                Contributions. In this paper, we propose
    italiana.
                                                                OCTIS 2.0, an extension of the existing frame-
     Copyright © 2021 for this paper by its authors. Use per-
mitted under Creative Commons License Attribution 4.0 In-       work that integrates both a single-objective
ternational (CC BY 4.0).                                        and multi-objective hyper-parameter optimization
strategy, using Bayesian optimization. Moreover,       other languages is useful for investigating the pe-
we also pre-process and include two novel datasets     culiarities of different topic modeling methods.
in Italian. We will then briefly show the poten-
tiality of the extended framework by comparing         Single-Objective Hyper-parameter Optimiza-
different topic models on the new released Italian     tion. OCTIS uses single-objective Bayesian Op-
datasets. We believe these resources can be use-       timization (Snoek et al., 2012; Shahriari et al.,
ful for the topic modeling and NLP communities,        2015) to tune the topic models’ hyper-parameters
since they can be used as benchmarks for the Ital-     with respect to a selected evaluation metric. In
ian language.                                          particular, the user specifies the search space
                                                       for the hyper-parameters and an objective metric.
2     OCTIS: Optimizing and Comparing                  Then, BO sequentially explores the search space
      Topic Models Is Simple!                          to determine the optimal hyper-parameter config-
                                                       uration. Since the models are usually probabilis-
2.1    OCTIS 1.0                                       tic and can give different results with the same
OCTIS (Terragni et al., 2021a, Optimizing and          hyper-parameter configuration, the objective func-
Comparing is Simple!) is an open-source evalu-         tion is computed as the median of a given number
ation framework for the comparison of topic mod-       of model runs (i.e., topic models run with the same
els, that allows a user to optimize the models’        hyper-parameter configuration) computed for the
hyper-parameters for a fair experimental compar-       selected evaluation metric. OCTIS uses the Scikit-
ison. The evaluation framework is composed of          Optimize library (Head et al., 2018) for the imple-
different modules that interact with each other: (1)   mentation of the single-objective hyper-parameter
dataset and pre-processing tools, (2) topic model-     Bayesian optimization.
ing, (3) hyper-parameter optimization, (4) evalua-        The use of a single-objective approach is how-
tion metrics. OCTIS can be used both as a python       ever limited. In fact, this strategy disregards other
library and through a web dashboard. It also pro-      objectives. For example, a user may require to op-
vides a set of pre-processed datasets, state-of-the-   timize the coherence of the topics and their diver-
art topic models and several evaluation metrics.       sity at the same time.
   We will now briefly describe the two compo-
nents that we will extend in this work: the pre-       2.2    OCTIS 2.0
processed datasets and the hyper-parameter opti-       New dataset resources for the Italian language.
mization module.                                       Since OCTIS provides only English datasets, we
                                                       extend the set of datasets by including two new
Pre-processing and Datasets. OCTIS currently
                                                       datasets in Italian. We build the two datasets from
provides functionalities for pre-processing the
                                                       the Italian version of the Europarl dataset3 and
texts, which include the lemmatization of the text,
                                                       from the Italian abstracts of DBPedia.4 In partic-
the removal of punctuation, numbers and stop-
                                                       ular, we randomly sample 5000 documents from
words, and the removal of words based on their
                                                       Europarl and we randomly sample 1000 Italian ab-
frequency. Moreover, the framework already pro-
                                                       stracts for 5 DBpedia types (event, organization,
vides 4 pre-processed datasets, that are ready to
                                                       place, person, work), for a total of 5000 abstracts.
use for topic modeling. These datasets are 20
NewsGroups,1 M10 (Lim and Buntine, 2014),                  We preprocess the datasets using the following
DBLP,2 and BBC News (Greene and Cunning-               strategy: we lemmatize the text, we remove the
ham, 2006). All the datasets are split into three      punctuation, numbers and Italian stop-words, we
partitions: training, testing and validation.          filter out the words with a document frequency
   All the currently provided datasets are in En-      higher than the 50% and less than the 0.1% for Eu-
glish. OCTIS already provides language-specific        roparl and 0.2% for DBPedia and we also remove
pre-processing tools (e.g. lemmatizers for multi-      the documents with less than 5 words. These val-
ple languages), but it does not present datasets in    ues have been chosen by manually inspecting the
other languages. Creating benchmark datasets for       resulting pre-processed datasets.
                                                           We report the most relevant statistics of the
  1
    http://people.csail.mit.edu/jrennie/2
                                                          3
0Newsgroups/                                              https://www.statmt.org/europarl/
  2                                                       4
    https://github.com/shiruipan/TriDNR/                  https://www.dbpedia.org/resources/on
tree/master/data                                       tology/
novel Italian datasets in Table 1. Following the
original paper, we split the datasets in three parti-   #define and launch optimization
                                                        mmm = MOOptimizer(
tions: training (75%), validation (15%), and test-              dataset=dataset, model=model,
ing (15%).                                                      config_file=config_file,
                                                                metrics=metrics, maximize=True)
                                                        mmm.optimize()
                            Avg. doc      Num. of
                Num. of
  Dataset                   length        unique           The snippet will run a multi-objective optimiza-
                documents
                            (Std. dev.)   words         tion experiment that will return the Pareto front of
  DBPedia            4251    5.5 (11.8)       2047      the diversity and coherence metrics on the Ital-
  Europarl           3616   20.6 (19.3)       2000      ian dataset DBPedia by optimizing the hyper-
                                                        parameters (defined in a configuration file) of LDA
 Table 1: Statistics of the pre-processed datasets.     with 25 topics.
                                                           In keeping with the spirit of the first version of
                                                        OCTIS, the framework extension is open-source
From Single-objective to Multi-objective
                                                        and easily accessible, in order to guarantee re-
Hyper-parameter Bayesian Optimization.
                                                        searchers and practitioners a fairer, accessible
Given the limitations of the single-objective
                                                        and reproducible comparison between the mod-
hyperparameter optimization approach, we
                                                        els (Bianchi and Hovy, 2021). OCTIS 2.0 is avail-
extend OCTIS by including a multi-objective
                                                        able as extension of the original library, at the fol-
approach (Kandasamy et al., 2020; Paria et
                                                        lowing link: https://github.com/mind-
al., 2019). Single-objective BO can be in fact
                                                        Lab/octis.
generalized to multiple objective functions, where
the final aim is to recover the Pareto frontier of      3     Experimental Setting
the objective functions, i.e. the set of Pareto
optimal points. A point is Pareto optimal if            In the following, we will show the capabilities of
it cannot be improved in any of the objectives          the extended framework on the new datasets by
without degrading some other objective. Using           carrying out a simple experimental campaign.
a multi-objective hyper-parameter optimization             We assume an experimental setting in which a
approach thus allows us not only to identify the        topic modeling practitioner is interested in discov-
best performing model, but also to empirically          ering the main thematic information of the two
discover competing objectives.                          novel datasets in Italian. However, the user does
   Since the original Scikit-Optimize library does      not have prior knowledge on the datasets, there-
not provide multi-objective optimization tools, we      fore does not know which topic model is the most
use the dragonfly library5 (Paria et al., 2019). Like   appropriate. Moreover, the user aims to get topics
the single-objective optimization, the user must        which are coherent and make sense together but
specify the hyper-parameter search space. But in        which are also diverse and separated from the oth-
addition, they also need to specify which functions     ers. Let us notice that a user could consider a dif-
they want to optimize. We report a simple coding        ferent set of metrics to optimize, by selecting one
example below:                                          of the already defined metrics available in OCTIS
                                                        or by defining novel metrics.
# loading of a pre-processed dataset
dataset = Dataset()                                     3.1    Evaluation Metrics
dataset.fetch_dataset("DBPedia_IT")
                                                        We briefly describe the two evaluation metrics
#model instantiation                                    (one of topic coherence and one of topic diver-
lda = LDA(num_topics=25)
                                                        sity) that we will target as the two objectives of
#definition of the metrics to optimize                  the multi-objective Bayesian optimization. Both
td = TopicDiversity()                                   metrics need to be maximized.
coh = Coherence()
metrics = [td, coh]
                                                        IRBO (Bianchi et al., 2021a; Terragni et al.,
#definition of the search space                         2021b) is a measure of topic diversity (0 for iden-
config_file = "path/to/search/space/file"               tical topics and 1 for completely different topics).
     5
         https://github.com/dragonfly/dragonf           It is based on the Ranked-Biased Overlap mea-
ly                                                      sure (Webber et al., 2010). Topics with common
words at different rankings are penalized less than       and their corresponding ranges in Table 2. For
topics sharing the same words at the highest ranks.       each model, we optimize the number of topics,
                                                          ranging from 5 to 100 topics. We select the
NPMI        (Lau et al., 2014) measures Normal-
                                                          ranges of the hyper-parameters similarly to previ-
ized Pointwise Mutual Information of each pair of
                                                          ous work (Terragni and Fersini, 2021).
words (wi , wj ) in the 10-top words of each topic.
                                                             Regarding LDA, we also optimize the hyper-
It is a topic coherence measure, that evaluates how
                                                          parameters α and β priors that the sparsity of the
much the words in a topic are related to each other.
                                                          topics in the documents and sparsity of the words
3.2   Topic Models and Hyper-Parameter                    in the topic distributions respectively. These
      Setting                                             hyper-parameters are set to range between 10−3
                                                          and 10−1 on a logarithmic scale.
We focus our experiments on four well-known
                                                             The hyper-parameters of NMF are mainly re-
topic models that OCTIS already provides, two
                                                          lated to the regularization applied to the factor-
of them are considered classical topic models
                                                          ized matrices. The regularization hyper-parameter
and the others are neural models. In particu-
                                                          controls if the regularization is applied only to the
lar, we trained Latent Dirichlet Allocation (Blei
                                                          matrix V , or to the matrix H, or both. The regular-
et al., 2003, LDA), Non-negative Matrix Factor-
                                                          ization factor denotes the constant that multiplies
ization (Lee and Seung, 2000, NMF), Embedded
                                                          the regularization terms. It ranges between 0 and
Topic Model (Dieng et al., 2020, ETM), Con-
                                                          0.5 (0 means no regularization). L1-L2 ratio con-
textualized Topic Models (Bianchi et al., 2021a;
                                                          trols the ratio between L1 and L2-regularization.
Bianchi et al., 2021b, CTM).
                                                          It ranges between 0 and 1, where 0 corresponds to
                                                          L2 regularization only, 1 corresponds to L1 reg-
Model Hyper-parameter            Values/Range
                                                          ularization only, otherwise it is a combination of
All      Number of topics        [5, 100]                 the two types. We also optimize the initialization
         α prior                 [10−3 , 10]              method for the two matrices W and H.
LDA
         β prior                 [10−3 , 10]                 Since ETM and CTM are neural models, their
                                                          hyper-parameters are mainly related to the net-
         Regularization factor   [0, 0.5]
         L1-L2 ratio             [0,1]                    work architecture. We optimize the number of
                                 nndsvd, nndsvda,         neurons (ranging from 100 to 1000, with a step of
NMF      Initialization method                            100). For simplicity, each layer has the same num-
                                 nndsvdar, random
                                 V matrix, H matrix,      ber of neurons. We also consider different variants
         Regularization                                   of activation functions and optimizers. We set the
                                 both
                                                          dropout to range between 0 and 0.9 and the learn-
                                 elu, sigmoid, soft-
         Activation function                              ing rate, that to range between 10−3 and 10−1 , on
                                 plus, selu
         Dropout                 [0, 0.9]                 a logarithm scale. We fix the batch size to 200 and
ETM      Learning rate           [10−3 , 10−1 ]           we adopted an early stopping criterion for deter-
                                 {100, 200, . . ., 900,   mining the convergence of each model.
         Number of neurons                                   Moreover, only for CTM we also optimized the
                                 1000}
         Optimizer               adam, sgd, rmsprop       momentum, ranging between 0 and 0.9, and the
                                                          number of layers (ranging from 1 to 5). Follow-
                                 elu, sigmoid, soft-
         Activation function                              ing (Bianchi et al., 2021b), we use the contex-
                                 plus, selu
         Dropout                 [0, 0.9]                 tualized document representations derived from
         Learning rate           [10−3 , 10−1 ]           SentenceBERT (Reimers and Gurevych, 2019).
CTM      Momentum                [0, 0.9]                 In particular, we use the pre-trained multilingual
         Number of layers        1, 2, 3, 4, 5            Universal Sentence Encoder.6
                                 {100, 200, . . ., 900,      For all the models, we set the remaining param-
         Number of neurons
                                 1000}                    eters to their default values. Finally, we train each
         Optimizer               adam, sgd, rmsprop       model 30 times and consider the median of the
                                                          30 evaluations as the evaluation of the function to
      Table 2: Hyper-parameters and ranges.                  6
                                                             Let us notice that there is not a Sentence BERT-like
                                                          model for Italian. Therefore we used a multilingual one:
  We summarize the models’ hyper-parameters               distiluse-base-multilingual-cased-v1.
be optimized. We sample the n initial configura-       topic coherence but gets a lower coherence as the
tions using the Latin Hypercube Sampling, with n       diversity increases. Therefore, CTM is the model
equal to the number of hyperparameters to opti-        to prefer if a user wants to get totally separated
mize plus 2 to provide enough configurations for       topics but good coherence. Instead, LDA and
the initial surrogate model to fit. The total num-     ETM have lower performance than the others. We
ber of BO iterations for each model is 125. We         also noticed from our experiments that the perfor-
use Gaussian Process as the probabilistic surrogate    mance of ETM is affected when the documents are
model and the Upper Confidence Bound (UCB) as          shorter (on the Europarl dataset), often originating
the acquisition function.                              the phenomenon of mode collapsing, i.e. obtain-
                                                       ing all the topics equal to the others.
4     Results
                                                       4.2    Qualitative Results
In the following, we report the results of the com-
parative analysis between the considered models        In Table 3 we report an example of topics discov-
on the Italian datasets.                               ered by the models. We selected the best hyper-
                                                       parameter configuration discovered by the models
4.1    Quantitative Results                            with 5 topics and randomly sampled a model run
                                                       among the 30 runs. Let us notice that, for the sake
                                                       of simplicity, we have to fix the number of topics
                                                       here and select a run among the total of 30 runs.
                                                       Therefore, the qualitative results reported in Ta-
                                                       ble 3 may not reflect the overall results.
                                                          We can notice that NMF obtains more coherent
                                                       and stable topics. CTM and LDA obtain topics
                                                       that have a higher variance: in particular, CTM
                                                       discovers a topic (the fourth one, NPMI=-0.51)
                                                       that lowers the average coherence, while LDA dis-
                                                       covers a topic (the second one, NPMI=0.48) that
                                                       effectively increases the average coherence. On
                                                       the other hand, the topics discovered by ETM are
                                                       more stable but have a lower coherence on aver-
                                                       age. As already observed in previous work (Al-
                                                       Sumait et al., 2009; Doogan and Buntine, 2021),
                                                       obtaining junk or mixed topics is common in topic
                                                       models and this problem can be addressed by fil-
                                                       tering out the topics that are less relevant.

                                                       5     Conclusion
                                                       In this paper, we presented OCTIS 2.0, the exten-
                                                       sion of the evaluation framework OCTIS for topic
Figure 1: Pareto front of the performance of           modeling. This tool can now address the problem
the considered models for the analyzed Italian         of estimating the optimal hyper-parameter config-
datasets.                                              urations of different topic models using a multi-
                                                       objective Bayesian optimization approach. More-
   We jointly consider the results of both objec-      over, we also released two novel datasets in Italian
tives by plotting the Pareto frontier of the results   which can be used as benchmark datasets for the
of topic diversity and topic coherence. Figure 1       Italian topic modeling and NLP communities.
shows the frontier of each model for the pair of          We conducted a simple experimental campaign
metrics (NPMI, IRBO). We can notice that the           to show to potentiality of the extended framework.
topic models have similar frontiers in each dataset.   We have seen that using a multi-objective hyperpa-
The most competitive models are NMF and CTM.           rameter optimization approach allows us not only
In particular, NMF outperforms the others for the      to identify the best performing model over the oth-
 Model    Top words                                                                                          NPMI
          de album pubblicare italiano the uniti situare fondare università noto                            -0.05
          torneo giocare tennis edizione tour atp ambito open categoria cemento                               0.48
 LDA      film pubblicare the album serie musicale venire statunitense rock band                              0.11
          guerra battaglia venire situare statunitense spagnolo partito esercito distretto mondiale          -0.14
          comune campionato squadra abitante calcio regione situare società francese vincere                -0.03
          comune abitante dipartimento regione situare francese alta distretto est grand                      0.29
          torneo giocare tennis tour atp open edizione ambito categoria cemento                               0.48
 NMF      album pubblicare studio the musicale statunitense records singolo cantante rock                     0.29
          calciatore ruolo allenatore calcio centrocampista difensore attaccante portiere settembre aprile    0.24
          contea america uniti situare comune censimento designated census place capoluogo                    0.39
          album the pubblicare band statunitense singolo brano of musicale rock                               0.26
          superare argentino calciatore el buenos maria en svezia situare chiesa                             -0.29
 CTM      partito battaglia guerra venire politico de linea isola stazione regno                             -0.08
          st stella vendetta dollaro robert company ritorno west superiore soggetto                          -0.51
          edizione tennis giocare torneo vincere tour campionato maschile disputare squadra                   0.18
          sede de italiano fondare nome azienda noto francese compagnia parigi                                0.06
          guerra partito battaglia venire nord politico tedesco esercito regno militare                       0.03
 ETM      torneo situare comune giocare abitante edizione tennis tour regione uniti                          -0.10
          film serie the dirigere gioco pubblicare statunitense televisivo venire romanzo                     0.07
          album pubblicare campionato squadra musicale the calcio statunitense singolo vincere               -0.12

Table 3: Example of top words of 5 topics for each considered model and the corresponding topic
coherence (NPMI).


ers, thus guaranteeing a fairer comparison among               ized document embeddings improve topic coher-
different models, but also to empirically discover             ence. In Proceedings of the 59th Annual Meet-
                                                               ing of the Association for Computational Linguis-
the relationships between different objectives.
                                                               tics and the 11th International Joint Conference on
   As future work, we aim to extend the framework              Natural Language Processing, ACL/IJCNLP 2021,
by considering additional datasets in different and            pages 759–766. Association for Computational Lin-
possibly low-resource languages, which require                 guistics.
different pre-processing strategies and would al-           Federico Bianchi, Silvia Terragni, Dirk Hovy, Debora
low researchers to investigate the peculiarities of           Nozza, and Elisabetta Fersini. 2021b. Cross-lingual
different topic modeling methods.                             contextualized topic models with zero-shot learning.
                                                              In Proceedings of the 16th Conference of the Euro-
                                                              pean Chapter of the Association for Computational
                                                              Linguistics: Main Volume, EACL 2021, pages 1676–
References                                                    1683. Association for Computational Linguistics.
Loulwah AlSumait, Daniel Barbará, James Gentle, and
  Carlotta Domeniconi. 2009. Topic Significance             David M. Blei, Andrew Y. Ng, and Michael I. Jordan.
  Ranking of LDA Generative Models. In Machine                2003. Latent dirichlet allocation. Journal of Ma-
  Learning and Knowledge Discovery in Databases,              chine Learning Research, 3:993–1022.
  European Conference, ECML PKDD 2009, volume               David M Blei. 2012. Probabilistic topic models. Com-
  5781 of Lecture Notes in Computer Science, pages            munications of the ACM, 55(4):77–84.
  67–82. Springer.
                                                            Jordan L. Boyd-Graber, Yuening Hu, and David M.
Francesco Archetti and Antonio Candelieri. 2019.               Mimno. 2017. Applications of topic models.
  Bayesian Optimization and Data Science. Springer             Found. Trends Inf. Retr., 11(2-3):143–296.
  International Publishing.
                                                            Adji Bousso Dieng, Francisco J. R. Ruiz, and David M.
Federico Bianchi and Dirk Hovy. 2021. On the gap be-          Blei. 2020. Topic modeling in embedding spaces.
  tween adoption and understanding in nlp. In Find-           Trans. Assoc. Comput. Linguistics, 8:439–453.
  ings of the Association for Computational Linguis-
  tics: ACL-IJCNLP 2021, pages 3895–3901.                   Thanh-Nam Doan and Tuan-Anh Hoang.          2021.
                                                              Benchmarking neural topic models: An empirical
Federico Bianchi, Silvia Terragni, and Dirk Hovy.             study. In Findings of the Association for Com-
  2021a. Pre-training is a hot topic: Contextual-             putational Linguistics: ACL-IJCNLP 2021, pages
  4363–4368, Online, August. Association for Com-          Processing and the 9th International Joint Confer-
  putational Linguistics.                                  ence on Natural Language Processing, (EMNLP-
                                                           IJCNLP), pages 3980–3990. Association for Com-
Caitlin Doogan and Wray L. Buntine. 2021. Topic            putational Linguistics.
  model or topic twaddle? re-evaluating semantic in-
  terpretability measures. In Proceedings of the 2021    Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P
  Conference of the North American Chapter of the          Adams, and Nando De Freitas. 2015. Taking the hu-
  Association for Computational Linguistics: Human         man out of the loop: A review of bayesian optimiza-
  Language Technologies, NAACL-HLT 2021, Online,           tion. Proceedings of the IEEE, 104(1):148–175.
  June 6-11, 2021, pages 3824–3848. Association for
  Computational Linguistics.                             Jasper Snoek, Hugo Larochelle, and Ryan P. Adams.
                                                            2012. Practical Bayesian Optimization of Machine
Bruno Giovanni Galuzzi, Ilaria Giordani, Antonio Can-       Learning Algorithms. In Advances in Neural Infor-
  delieri, Riccardo Perego, and Francesco Archetti.         mation Processing Systems 25: 26th Annual Con-
  2020. Hyperparameter optimization for recom-              ference on Neural Information Processing Systems,
  mender systems through bayesian optimization.             pages 2960–2968.
  Computational Management Science, pages 1–21.
                                                         Silvia Terragni and Elisabetta Fersini. 2021. An em-
Derek Greene and Pádraig Cunningham. 2006. Practi-         pirical analysis of topic models: Uncovering the
  cal Solutions to the Problem of Diagonal Dominance        relationships between hyperparameters, document
  in Kernel Document Clustering. In Proceedings             length and performance measures. In Recent Ad-
  of the 23rd International Conference on Machine           vances in Natural Language Processing (RANLP).
  learning (ICML’06), pages 377–384. ACM Press.
                                                         Silvia Terragni, Elisabetta Fersini, and Enza Messina.
Tim Head, Gilles Louppe MechCoder, Iaroslav
                                                            2020a. Constrained relational topic models. Infor-
  Shcherbatyi, et al. 2018. scikit-optimize/scikit-
                                                            mation Sciences, 512:581 – 594.
  optimize: v0. 5.2.
Kirthevasan Kandasamy, Karun Raju Vysyaraju, Willie      Silvia Terragni, Debora Nozza, Elisabetta Fersini, and
  Neiswanger, Biswajit Paria, Christopher R. Collins,       Messina Enza. 2020b. Which matters most? com-
  Jeff Schneider, Barnabás Póczos, and Eric P. Xing.      paring the impact of concept and document relation-
  2020. Tuning Hyperparameters without Grad Stu-            ships in topic models. In Proceedings of the First
  dents: Scalable and Robust Bayesian Optimisation          Workshop on Insights from Negative Results in NLP,
  with Dragonfly. Journal of Machine Learning Re-           pages 32–40.
  search, 21:81:1–81:27.
                                                         Silvia Terragni, Elisabetta Fersini, Bruno Giovanni
Jey Han Lau, David Newman, and Timothy Baldwin.             Galuzzi, Pietro Tropeano, and Antonio Candelieri.
   2014. Machine reading tea leaves: Automatically          2021a. OCTIS: Comparing and Optimizing Topic
   evaluating topic coherence and topic model quality.      models is Simple! In Proceedings of the 16th Con-
   In Proceedings of the 14th Conference of the Euro-       ference of the European Chapter of the Association
   pean Chapter of the Association for Computational        for Computational Linguistics: System Demonstra-
   Linguistics, EACL 2014, pages 530–539.                   tions, EACL 2021, pages 263–270. Association for
                                                            Computational Linguistics.
Daniel D. Lee and H. Sebastian Seung. 2000. Al-
  gorithms for non-negative matrix factorization. In     Silvia Terragni, Elisabetta Fersini, and Enza Messina.
  Advances in Neural Information Processing Systems         2021b. Word embedding-based topic similarity
  13, Papers from Neural Information Processing Sys-        measures. In Natural Language Processing and In-
  tems (NIPS) 2000, pages 556–562. MIT Press.               formation Systems - 26th International Conference
                                                            on Applications of Natural Language to Informa-
Kar Wai Lim and Wray L. Buntine. 2014. Bibli-               tion Systems, NLDB 2021, volume 12801 of Lecture
  ographic analysis with the citation network topic         Notes in Computer Science, pages 33–45. Springer.
  model. In Proceedings of the Sixth Asian Confer-
  ence on Machine Learning, ACML 2014.                   William Webber, Alistair Moffat, and Justin Zobel.
                                                           2010. A similarity measure for indefinite rankings.
Biswajit Paria, Kirthevasan Kandasamy, and Barnabás       ACM Trans. Inf. Syst., 28(4):20:1–20:38.
  Póczos. 2019. A Flexible Framework for Multi-
  Objective Bayesian Optimization using Random
  Scalarizations. In Proceedings of the Thirty-Fifth
  Conference on Uncertainty in Artificial Intelligence
  (UAI), volume 115 of Proceedings of Machine
  Learning Research, pages 766–776, Tel Aviv, Israel.
  AUAI Press.
Nils Reimers and Iryna Gurevych. 2019. Sentence-
  BERT: Sentence Embeddings using Siamese BERT-
  Networks. In Proceedings of the 2019 Confer-
  ence on Empirical Methods in Natural Language