OCTIS 2.0: Optimizing and Comparing Topic Models in Italian Is Even Simpler! Silvia Terragni and Elisabetta Fersini University of Milano-Bicocca, Milan, Italy s.terragni4@campus.unimib.it, elisabetta.fersini@unimib.it 1 Introduction Abstract Topic models are statistical methods that aim to English. OCTIS is an open-source frame- extract the hidden topics underlying a collection work for training, evaluating and compar- of documents (Blei et al., 2003; Blei, 2012; Boyd- ing Topic Models. This tool uses single- Graber et al., 2017). Topics are often represented objective Bayesian Optimization (BO) to by sets of words that make sense together, e.g. the optimize the hyper-parameters of the mod- words “cat, animal, dog, mouse” may represent a els and thus guarantee a fairer compari- topic about animals. Topic models’ evaluations son. Yet, a single-objective approach dis- are usually limited to the comparison of models regards that a user may want to simulta- whose hyper-parameters are held fixed (Doan and neously optimize multiple objectives. We Hoang, 2021; Terragni et al., 2020a; Terragni et therefore propose OCTIS 2.0: the exten- al., 2020b). However, hyper-parameters can have sion of OCTIS that addresses the problem an impressive impact on the models’ performance of estimating the optimal hyper-parameter and therefore fixing the hyper-parameters prevents configurations for a topic model using the researchers from discovering the best topic multi-objective BO. Moreover, we also re- model on the selected dataset. lease and integrate two pre-processed Ital- Recently, OCTIS (Terragni et al., 2021a, Opti- ian datasets, which can be easily used as mizing and Comparing Topic Models is Simple) benchmarks for the Italian language. has been released: a comprehensive and open- Italiano. OCTIS è un framework open- source framework for training, analyzing, and source per il training, la valutazione comparing topic models, over several datasets and e la comparazione di Topic Models. evaluation metrics. OCTIS determines the opti- Questo strumento utilizza l’ottimizzazione mal hyper-parameter configuration according to Bayesiana (BO) a singolo obiettivo per a Bayesian Optimization (BO) strategy (Archetti ottimizzare gli iperparametri dei modelli and Candelieri, 2019; Snoek et al., 2012; Galuzzi e quindi garantire una comparazione più et al., 2020). The framework already provides sev- equa. Tuttavia, questo approccio ignora eral features and resources, among which at least che un utente potrebbe voler ottimizzare 8 topic models, 4 categories of evaluation metrics, pi‘u di un obiettivo. Proponiamo perciò and 4 pre-processed datasets. However, the frame- OCTIS 2.0: l’estensione di OCTIS che af- work uses a single-objective Bayesian optimiza- fronta il problema della stima delle config- tion approach, disregarding that a user may want urazioni ottimali degli iperparametri di un to simultaneously optimize more than one objec- topic model usando la BO multi-obiettivo. tive (Terragni and Fersini, 2021). For example, a In aggiunta, rilasciamo e integriamo an- user may be interested in obtaining topics that are che due nuovi dataset in italiano pre- coherent but also diverse and separated from each processati, che possono essere facilmente other. utilizzati come benchmark per la lingua Contributions. In this paper, we propose italiana. OCTIS 2.0, an extension of the existing frame- Copyright © 2021 for this paper by its authors. Use per- mitted under Creative Commons License Attribution 4.0 In- work that integrates both a single-objective ternational (CC BY 4.0). and multi-objective hyper-parameter optimization strategy, using Bayesian optimization. Moreover, other languages is useful for investigating the pe- we also pre-process and include two novel datasets culiarities of different topic modeling methods. in Italian. We will then briefly show the poten- tiality of the extended framework by comparing Single-Objective Hyper-parameter Optimiza- different topic models on the new released Italian tion. OCTIS uses single-objective Bayesian Op- datasets. We believe these resources can be use- timization (Snoek et al., 2012; Shahriari et al., ful for the topic modeling and NLP communities, 2015) to tune the topic models’ hyper-parameters since they can be used as benchmarks for the Ital- with respect to a selected evaluation metric. In ian language. particular, the user specifies the search space for the hyper-parameters and an objective metric. 2 OCTIS: Optimizing and Comparing Then, BO sequentially explores the search space Topic Models Is Simple! to determine the optimal hyper-parameter config- uration. Since the models are usually probabilis- 2.1 OCTIS 1.0 tic and can give different results with the same OCTIS (Terragni et al., 2021a, Optimizing and hyper-parameter configuration, the objective func- Comparing is Simple!) is an open-source evalu- tion is computed as the median of a given number ation framework for the comparison of topic mod- of model runs (i.e., topic models run with the same els, that allows a user to optimize the models’ hyper-parameter configuration) computed for the hyper-parameters for a fair experimental compar- selected evaluation metric. OCTIS uses the Scikit- ison. The evaluation framework is composed of Optimize library (Head et al., 2018) for the imple- different modules that interact with each other: (1) mentation of the single-objective hyper-parameter dataset and pre-processing tools, (2) topic model- Bayesian optimization. ing, (3) hyper-parameter optimization, (4) evalua- The use of a single-objective approach is how- tion metrics. OCTIS can be used both as a python ever limited. In fact, this strategy disregards other library and through a web dashboard. It also pro- objectives. For example, a user may require to op- vides a set of pre-processed datasets, state-of-the- timize the coherence of the topics and their diver- art topic models and several evaluation metrics. sity at the same time. We will now briefly describe the two compo- nents that we will extend in this work: the pre- 2.2 OCTIS 2.0 processed datasets and the hyper-parameter opti- New dataset resources for the Italian language. mization module. Since OCTIS provides only English datasets, we extend the set of datasets by including two new Pre-processing and Datasets. OCTIS currently datasets in Italian. We build the two datasets from provides functionalities for pre-processing the the Italian version of the Europarl dataset3 and texts, which include the lemmatization of the text, from the Italian abstracts of DBPedia.4 In partic- the removal of punctuation, numbers and stop- ular, we randomly sample 5000 documents from words, and the removal of words based on their Europarl and we randomly sample 1000 Italian ab- frequency. Moreover, the framework already pro- stracts for 5 DBpedia types (event, organization, vides 4 pre-processed datasets, that are ready to place, person, work), for a total of 5000 abstracts. use for topic modeling. These datasets are 20 NewsGroups,1 M10 (Lim and Buntine, 2014), We preprocess the datasets using the following DBLP,2 and BBC News (Greene and Cunning- strategy: we lemmatize the text, we remove the ham, 2006). All the datasets are split into three punctuation, numbers and Italian stop-words, we partitions: training, testing and validation. filter out the words with a document frequency All the currently provided datasets are in En- higher than the 50% and less than the 0.1% for Eu- glish. OCTIS already provides language-specific roparl and 0.2% for DBPedia and we also remove pre-processing tools (e.g. lemmatizers for multi- the documents with less than 5 words. These val- ple languages), but it does not present datasets in ues have been chosen by manually inspecting the other languages. Creating benchmark datasets for resulting pre-processed datasets. We report the most relevant statistics of the 1 http://people.csail.mit.edu/jrennie/2 3 0Newsgroups/ https://www.statmt.org/europarl/ 2 4 https://github.com/shiruipan/TriDNR/ https://www.dbpedia.org/resources/on tree/master/data tology/ novel Italian datasets in Table 1. Following the original paper, we split the datasets in three parti- #define and launch optimization mmm = MOOptimizer( tions: training (75%), validation (15%), and test- dataset=dataset, model=model, ing (15%). config_file=config_file, metrics=metrics, maximize=True) mmm.optimize() Avg. doc Num. of Num. of Dataset length unique The snippet will run a multi-objective optimiza- documents (Std. dev.) words tion experiment that will return the Pareto front of DBPedia 4251 5.5 (11.8) 2047 the diversity and coherence metrics on the Ital- Europarl 3616 20.6 (19.3) 2000 ian dataset DBPedia by optimizing the hyper- parameters (defined in a configuration file) of LDA Table 1: Statistics of the pre-processed datasets. with 25 topics. In keeping with the spirit of the first version of OCTIS, the framework extension is open-source From Single-objective to Multi-objective and easily accessible, in order to guarantee re- Hyper-parameter Bayesian Optimization. searchers and practitioners a fairer, accessible Given the limitations of the single-objective and reproducible comparison between the mod- hyperparameter optimization approach, we els (Bianchi and Hovy, 2021). OCTIS 2.0 is avail- extend OCTIS by including a multi-objective able as extension of the original library, at the fol- approach (Kandasamy et al., 2020; Paria et lowing link: https://github.com/mind- al., 2019). Single-objective BO can be in fact Lab/octis. generalized to multiple objective functions, where the final aim is to recover the Pareto frontier of 3 Experimental Setting the objective functions, i.e. the set of Pareto optimal points. A point is Pareto optimal if In the following, we will show the capabilities of it cannot be improved in any of the objectives the extended framework on the new datasets by without degrading some other objective. Using carrying out a simple experimental campaign. a multi-objective hyper-parameter optimization We assume an experimental setting in which a approach thus allows us not only to identify the topic modeling practitioner is interested in discov- best performing model, but also to empirically ering the main thematic information of the two discover competing objectives. novel datasets in Italian. However, the user does Since the original Scikit-Optimize library does not have prior knowledge on the datasets, there- not provide multi-objective optimization tools, we fore does not know which topic model is the most use the dragonfly library5 (Paria et al., 2019). Like appropriate. Moreover, the user aims to get topics the single-objective optimization, the user must which are coherent and make sense together but specify the hyper-parameter search space. But in which are also diverse and separated from the oth- addition, they also need to specify which functions ers. Let us notice that a user could consider a dif- they want to optimize. We report a simple coding ferent set of metrics to optimize, by selecting one example below: of the already defined metrics available in OCTIS or by defining novel metrics. # loading of a pre-processed dataset dataset = Dataset() 3.1 Evaluation Metrics dataset.fetch_dataset("DBPedia_IT") We briefly describe the two evaluation metrics #model instantiation (one of topic coherence and one of topic diver- lda = LDA(num_topics=25) sity) that we will target as the two objectives of #definition of the metrics to optimize the multi-objective Bayesian optimization. Both td = TopicDiversity() metrics need to be maximized. coh = Coherence() metrics = [td, coh] IRBO (Bianchi et al., 2021a; Terragni et al., #definition of the search space 2021b) is a measure of topic diversity (0 for iden- config_file = "path/to/search/space/file" tical topics and 1 for completely different topics). 5 https://github.com/dragonfly/dragonf It is based on the Ranked-Biased Overlap mea- ly sure (Webber et al., 2010). Topics with common words at different rankings are penalized less than and their corresponding ranges in Table 2. For topics sharing the same words at the highest ranks. each model, we optimize the number of topics, ranging from 5 to 100 topics. We select the NPMI (Lau et al., 2014) measures Normal- ranges of the hyper-parameters similarly to previ- ized Pointwise Mutual Information of each pair of ous work (Terragni and Fersini, 2021). words (wi , wj ) in the 10-top words of each topic. Regarding LDA, we also optimize the hyper- It is a topic coherence measure, that evaluates how parameters α and β priors that the sparsity of the much the words in a topic are related to each other. topics in the documents and sparsity of the words 3.2 Topic Models and Hyper-Parameter in the topic distributions respectively. These Setting hyper-parameters are set to range between 10−3 and 10−1 on a logarithmic scale. We focus our experiments on four well-known The hyper-parameters of NMF are mainly re- topic models that OCTIS already provides, two lated to the regularization applied to the factor- of them are considered classical topic models ized matrices. The regularization hyper-parameter and the others are neural models. In particu- controls if the regularization is applied only to the lar, we trained Latent Dirichlet Allocation (Blei matrix V , or to the matrix H, or both. The regular- et al., 2003, LDA), Non-negative Matrix Factor- ization factor denotes the constant that multiplies ization (Lee and Seung, 2000, NMF), Embedded the regularization terms. It ranges between 0 and Topic Model (Dieng et al., 2020, ETM), Con- 0.5 (0 means no regularization). L1-L2 ratio con- textualized Topic Models (Bianchi et al., 2021a; trols the ratio between L1 and L2-regularization. Bianchi et al., 2021b, CTM). It ranges between 0 and 1, where 0 corresponds to L2 regularization only, 1 corresponds to L1 reg- Model Hyper-parameter Values/Range ularization only, otherwise it is a combination of All Number of topics [5, 100] the two types. We also optimize the initialization α prior [10−3 , 10] method for the two matrices W and H. LDA β prior [10−3 , 10] Since ETM and CTM are neural models, their hyper-parameters are mainly related to the net- Regularization factor [0, 0.5] L1-L2 ratio [0,1] work architecture. We optimize the number of nndsvd, nndsvda, neurons (ranging from 100 to 1000, with a step of NMF Initialization method 100). For simplicity, each layer has the same num- nndsvdar, random V matrix, H matrix, ber of neurons. We also consider different variants Regularization of activation functions and optimizers. We set the both dropout to range between 0 and 0.9 and the learn- elu, sigmoid, soft- Activation function ing rate, that to range between 10−3 and 10−1 , on plus, selu Dropout [0, 0.9] a logarithm scale. We fix the batch size to 200 and ETM Learning rate [10−3 , 10−1 ] we adopted an early stopping criterion for deter- {100, 200, . . ., 900, mining the convergence of each model. Number of neurons Moreover, only for CTM we also optimized the 1000} Optimizer adam, sgd, rmsprop momentum, ranging between 0 and 0.9, and the number of layers (ranging from 1 to 5). Follow- elu, sigmoid, soft- Activation function ing (Bianchi et al., 2021b), we use the contex- plus, selu Dropout [0, 0.9] tualized document representations derived from Learning rate [10−3 , 10−1 ] SentenceBERT (Reimers and Gurevych, 2019). CTM Momentum [0, 0.9] In particular, we use the pre-trained multilingual Number of layers 1, 2, 3, 4, 5 Universal Sentence Encoder.6 {100, 200, . . ., 900, For all the models, we set the remaining param- Number of neurons 1000} eters to their default values. Finally, we train each Optimizer adam, sgd, rmsprop model 30 times and consider the median of the 30 evaluations as the evaluation of the function to Table 2: Hyper-parameters and ranges. 6 Let us notice that there is not a Sentence BERT-like model for Italian. Therefore we used a multilingual one: We summarize the models’ hyper-parameters distiluse-base-multilingual-cased-v1. be optimized. We sample the n initial configura- topic coherence but gets a lower coherence as the tions using the Latin Hypercube Sampling, with n diversity increases. Therefore, CTM is the model equal to the number of hyperparameters to opti- to prefer if a user wants to get totally separated mize plus 2 to provide enough configurations for topics but good coherence. Instead, LDA and the initial surrogate model to fit. The total num- ETM have lower performance than the others. We ber of BO iterations for each model is 125. We also noticed from our experiments that the perfor- use Gaussian Process as the probabilistic surrogate mance of ETM is affected when the documents are model and the Upper Confidence Bound (UCB) as shorter (on the Europarl dataset), often originating the acquisition function. the phenomenon of mode collapsing, i.e. obtain- ing all the topics equal to the others. 4 Results 4.2 Qualitative Results In the following, we report the results of the com- parative analysis between the considered models In Table 3 we report an example of topics discov- on the Italian datasets. ered by the models. We selected the best hyper- parameter configuration discovered by the models 4.1 Quantitative Results with 5 topics and randomly sampled a model run among the 30 runs. Let us notice that, for the sake of simplicity, we have to fix the number of topics here and select a run among the total of 30 runs. Therefore, the qualitative results reported in Ta- ble 3 may not reflect the overall results. We can notice that NMF obtains more coherent and stable topics. CTM and LDA obtain topics that have a higher variance: in particular, CTM discovers a topic (the fourth one, NPMI=-0.51) that lowers the average coherence, while LDA dis- covers a topic (the second one, NPMI=0.48) that effectively increases the average coherence. On the other hand, the topics discovered by ETM are more stable but have a lower coherence on aver- age. As already observed in previous work (Al- Sumait et al., 2009; Doogan and Buntine, 2021), obtaining junk or mixed topics is common in topic models and this problem can be addressed by fil- tering out the topics that are less relevant. 5 Conclusion In this paper, we presented OCTIS 2.0, the exten- sion of the evaluation framework OCTIS for topic Figure 1: Pareto front of the performance of modeling. This tool can now address the problem the considered models for the analyzed Italian of estimating the optimal hyper-parameter config- datasets. urations of different topic models using a multi- objective Bayesian optimization approach. More- We jointly consider the results of both objec- over, we also released two novel datasets in Italian tives by plotting the Pareto frontier of the results which can be used as benchmark datasets for the of topic diversity and topic coherence. Figure 1 Italian topic modeling and NLP communities. shows the frontier of each model for the pair of We conducted a simple experimental campaign metrics (NPMI, IRBO). We can notice that the to show to potentiality of the extended framework. topic models have similar frontiers in each dataset. We have seen that using a multi-objective hyperpa- The most competitive models are NMF and CTM. rameter optimization approach allows us not only In particular, NMF outperforms the others for the to identify the best performing model over the oth- Model Top words NPMI de album pubblicare italiano the uniti situare fondare università noto -0.05 torneo giocare tennis edizione tour atp ambito open categoria cemento 0.48 LDA film pubblicare the album serie musicale venire statunitense rock band 0.11 guerra battaglia venire situare statunitense spagnolo partito esercito distretto mondiale -0.14 comune campionato squadra abitante calcio regione situare società francese vincere -0.03 comune abitante dipartimento regione situare francese alta distretto est grand 0.29 torneo giocare tennis tour atp open edizione ambito categoria cemento 0.48 NMF album pubblicare studio the musicale statunitense records singolo cantante rock 0.29 calciatore ruolo allenatore calcio centrocampista difensore attaccante portiere settembre aprile 0.24 contea america uniti situare comune censimento designated census place capoluogo 0.39 album the pubblicare band statunitense singolo brano of musicale rock 0.26 superare argentino calciatore el buenos maria en svezia situare chiesa -0.29 CTM partito battaglia guerra venire politico de linea isola stazione regno -0.08 st stella vendetta dollaro robert company ritorno west superiore soggetto -0.51 edizione tennis giocare torneo vincere tour campionato maschile disputare squadra 0.18 sede de italiano fondare nome azienda noto francese compagnia parigi 0.06 guerra partito battaglia venire nord politico tedesco esercito regno militare 0.03 ETM torneo situare comune giocare abitante edizione tennis tour regione uniti -0.10 film serie the dirigere gioco pubblicare statunitense televisivo venire romanzo 0.07 album pubblicare campionato squadra musicale the calcio statunitense singolo vincere -0.12 Table 3: Example of top words of 5 topics for each considered model and the corresponding topic coherence (NPMI). ers, thus guaranteeing a fairer comparison among ized document embeddings improve topic coher- different models, but also to empirically discover ence. In Proceedings of the 59th Annual Meet- ing of the Association for Computational Linguis- the relationships between different objectives. tics and the 11th International Joint Conference on As future work, we aim to extend the framework Natural Language Processing, ACL/IJCNLP 2021, by considering additional datasets in different and pages 759–766. Association for Computational Lin- possibly low-resource languages, which require guistics. different pre-processing strategies and would al- Federico Bianchi, Silvia Terragni, Dirk Hovy, Debora low researchers to investigate the peculiarities of Nozza, and Elisabetta Fersini. 2021b. Cross-lingual different topic modeling methods. contextualized topic models with zero-shot learning. In Proceedings of the 16th Conference of the Euro- pean Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, pages 1676– References 1683. Association for Computational Linguistics. Loulwah AlSumait, Daniel Barbará, James Gentle, and Carlotta Domeniconi. 2009. Topic Significance David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Ranking of LDA Generative Models. In Machine 2003. Latent dirichlet allocation. Journal of Ma- Learning and Knowledge Discovery in Databases, chine Learning Research, 3:993–1022. European Conference, ECML PKDD 2009, volume David M Blei. 2012. Probabilistic topic models. Com- 5781 of Lecture Notes in Computer Science, pages munications of the ACM, 55(4):77–84. 67–82. Springer. Jordan L. Boyd-Graber, Yuening Hu, and David M. Francesco Archetti and Antonio Candelieri. 2019. Mimno. 2017. Applications of topic models. Bayesian Optimization and Data Science. Springer Found. Trends Inf. Retr., 11(2-3):143–296. International Publishing. Adji Bousso Dieng, Francisco J. R. Ruiz, and David M. Federico Bianchi and Dirk Hovy. 2021. On the gap be- Blei. 2020. Topic modeling in embedding spaces. tween adoption and understanding in nlp. In Find- Trans. Assoc. Comput. Linguistics, 8:439–453. ings of the Association for Computational Linguis- tics: ACL-IJCNLP 2021, pages 3895–3901. Thanh-Nam Doan and Tuan-Anh Hoang. 2021. Benchmarking neural topic models: An empirical Federico Bianchi, Silvia Terragni, and Dirk Hovy. study. In Findings of the Association for Com- 2021a. Pre-training is a hot topic: Contextual- putational Linguistics: ACL-IJCNLP 2021, pages 4363–4368, Online, August. Association for Com- Processing and the 9th International Joint Confer- putational Linguistics. ence on Natural Language Processing, (EMNLP- IJCNLP), pages 3980–3990. Association for Com- Caitlin Doogan and Wray L. Buntine. 2021. Topic putational Linguistics. model or topic twaddle? re-evaluating semantic in- terpretability measures. In Proceedings of the 2021 Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Conference of the North American Chapter of the Adams, and Nando De Freitas. 2015. Taking the hu- Association for Computational Linguistics: Human man out of the loop: A review of bayesian optimiza- Language Technologies, NAACL-HLT 2021, Online, tion. Proceedings of the IEEE, 104(1):148–175. June 6-11, 2021, pages 3824–3848. Association for Computational Linguistics. Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. 2012. Practical Bayesian Optimization of Machine Bruno Giovanni Galuzzi, Ilaria Giordani, Antonio Can- Learning Algorithms. In Advances in Neural Infor- delieri, Riccardo Perego, and Francesco Archetti. mation Processing Systems 25: 26th Annual Con- 2020. Hyperparameter optimization for recom- ference on Neural Information Processing Systems, mender systems through bayesian optimization. pages 2960–2968. Computational Management Science, pages 1–21. Silvia Terragni and Elisabetta Fersini. 2021. An em- Derek Greene and Pádraig Cunningham. 2006. Practi- pirical analysis of topic models: Uncovering the cal Solutions to the Problem of Diagonal Dominance relationships between hyperparameters, document in Kernel Document Clustering. In Proceedings length and performance measures. In Recent Ad- of the 23rd International Conference on Machine vances in Natural Language Processing (RANLP). learning (ICML’06), pages 377–384. ACM Press. Silvia Terragni, Elisabetta Fersini, and Enza Messina. Tim Head, Gilles Louppe MechCoder, Iaroslav 2020a. Constrained relational topic models. Infor- Shcherbatyi, et al. 2018. scikit-optimize/scikit- mation Sciences, 512:581 – 594. optimize: v0. 5.2. Kirthevasan Kandasamy, Karun Raju Vysyaraju, Willie Silvia Terragni, Debora Nozza, Elisabetta Fersini, and Neiswanger, Biswajit Paria, Christopher R. Collins, Messina Enza. 2020b. Which matters most? com- Jeff Schneider, Barnabás Póczos, and Eric P. Xing. paring the impact of concept and document relation- 2020. Tuning Hyperparameters without Grad Stu- ships in topic models. In Proceedings of the First dents: Scalable and Robust Bayesian Optimisation Workshop on Insights from Negative Results in NLP, with Dragonfly. Journal of Machine Learning Re- pages 32–40. search, 21:81:1–81:27. Silvia Terragni, Elisabetta Fersini, Bruno Giovanni Jey Han Lau, David Newman, and Timothy Baldwin. Galuzzi, Pietro Tropeano, and Antonio Candelieri. 2014. Machine reading tea leaves: Automatically 2021a. OCTIS: Comparing and Optimizing Topic evaluating topic coherence and topic model quality. models is Simple! In Proceedings of the 16th Con- In Proceedings of the 14th Conference of the Euro- ference of the European Chapter of the Association pean Chapter of the Association for Computational for Computational Linguistics: System Demonstra- Linguistics, EACL 2014, pages 530–539. tions, EACL 2021, pages 263–270. Association for Computational Linguistics. Daniel D. Lee and H. Sebastian Seung. 2000. Al- gorithms for non-negative matrix factorization. In Silvia Terragni, Elisabetta Fersini, and Enza Messina. Advances in Neural Information Processing Systems 2021b. Word embedding-based topic similarity 13, Papers from Neural Information Processing Sys- measures. In Natural Language Processing and In- tems (NIPS) 2000, pages 556–562. MIT Press. formation Systems - 26th International Conference on Applications of Natural Language to Informa- Kar Wai Lim and Wray L. Buntine. 2014. Bibli- tion Systems, NLDB 2021, volume 12801 of Lecture ographic analysis with the citation network topic Notes in Computer Science, pages 33–45. Springer. model. In Proceedings of the Sixth Asian Confer- ence on Machine Learning, ACML 2014. William Webber, Alistair Moffat, and Justin Zobel. 2010. A similarity measure for indefinite rankings. Biswajit Paria, Kirthevasan Kandasamy, and Barnabás ACM Trans. Inf. Syst., 28(4):20:1–20:38. Póczos. 2019. A Flexible Framework for Multi- Objective Bayesian Optimization using Random Scalarizations. In Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI), volume 115 of Proceedings of Machine Learning Research, pages 766–776, Tel Aviv, Israel. AUAI Press. Nils Reimers and Iryna Gurevych. 2019. Sentence- BERT: Sentence Embeddings using Siamese BERT- Networks. In Proceedings of the 2019 Confer- ence on Empirical Methods in Natural Language