Patent-Based Import Substitution Analysis with Additively Regularized Topic Models Maria Milkova[0000-0002-9393-1044] Central Economics and Mathematics Institute of Russian Academy of Science, 47 Nakhimovsky Prospect, Moscow, 117418, Russia m.a.milkova@gmail.com Abstract. The rapid accumulation of textual data forces the use of various methods to present the structure of available information. One of these methods is topic modeling. We apply Additively Regularized Topic Models (ARTM) for analyzing an import substitution program based on patent data. The program in- cludes plans for 22 industries and contains more than 1500 products and tech- nologies for the proposed import substitution. The use of patent search based on ARTM allows to search immediately by the blocks of a priori information - terms of industrial plans for import substitution, and at the output get a selection of relevant documents for each of the industries. This approach allows not only to provide a comprehensive picture of the effectiveness of the program as a whole, but also to obtain more detailed information about which groups of products and technologies have been patented. It is important that topic model- ing also solves the problem of synonymy and homonymy of words. Keywords: Topic search, Topic modeling, Import substitution; Patent search, Patent analysis, Additively Regularized Topic Models, ARTM. 1 Introduction Currently, in the Digital Age the information accumulation process is rapid and the desire to develop effective ways to perceive the essence and to screen out unnecessary information is natural. The disordered nature of working with information, the lack of necessary skills and tools is a key factor preventing the recognition of future innova- tions and the prediction of their consequences [1]. Thus, it is necessary to use an ap- proach to the perception of information that would allow us to present a road map, the structure of the direction being studied. Considering in this work information in text form, we note that its overabundance is presented not only on the Internet, but also in the scientific community [2], the legal field [3], literature [4]. Various clustering methods have been well studied to obtain information about the structure of large amounts of text data: bibliometric analysis [5], clustering social networks users [6], analysis of discourse and sentiment of messages [7, 8], analysis of legal documents [9], etc. However, the changing digital reality requires from us a Proceedings of the 10th International Scientific and Practical Conference named after A. I. Kitov "Information Technologies and Mathematical Methods in Economics and Management (IT&MM-2020)", October 15-16, 2020, Moscow, Russia © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) revision of approaches to semantic compression of information. Firstly, with regard to textual data, it is necessary to take into account the problem of synonymy and ho- monymy of words. Secondly, when searching, it is necessary to take into account information that we already have. These requirements are satisfied by topic modeling - a modern tool that determines the structure of the collection of text documents by identifying hidden topics in the documents, as well as terms (words or phrases) that characterize each of the topics. In probabilistic topic modeling a document can with certain probabilities relate to sever- al topics at once, just as a term can define a particular topic with different probabili- ties. Each document is described by a discrete distribution on topics, and each topic is described by a discrete distribution on terms. Presenting the results in this form allows to get a roadmap of the direction you are interested in and significantly increases the accuracy and fullness of the search [10]. Over the past decade, the concept of topic search has been developing [11, 12]. This type of search helps to identify the topics of real interest, observing the most informative terms in the estimated topics. This article demonstrates an example of applying topic search in patent analysis - an integral part of both Foresight research, individual research on the prospects of innovative development, technological trends in various fields, etc. Our research con- tributes to this field by providing semi-supervised topic search based on different a priori information. 2 Literature Review Topic modeling has been intensively developing since the late 90s. An important milestone in the development of probabilistic text modeling is the Probabilistic Latent Semantic Analysis (PLSA) model described in [13]. PLSA was based on the principle of maximum likelihood and was developed as an alternative to classical text cluster- ing methods based on calculating distance functions. However, PLSA had a number of significant limitations [14], which were eliminat- ed in the Latent Dirichlet Allocation model (LDA) proposed in [15]. LDA is a genera- tive probabilistic model, in which documents are presented as a probabilistic mixture of hidden topics (each word in a document is generated by some latent topic), while the distribution of words in each topic is explicitly modeled, as well as the prior dis- tribution of topics in the document. Literature review shows that LDA is the leader among probabilistic topic models due to numerous generalizations, extensions and applications to the analysis of collec- tions of text documents [16-20]. However, in the works [21, 22], in which the view of PLSA and LDA is critically revised, it is noted that the widespread use of LDA is explained rather by its purely mathematical convenience for Bayesian learning. It was emphasized that the prior Dirichlet distributions and their generalizations have no convincing linguistic justifi- cation. Moreover, the transition from a generating model to an algorithm for adjusting its parameters requires rather cumbersome calculations, which become much more complicated when more complex prior distributions are introduced or when several linguistic phenomena are jointly simulated. For these reasons, the development of the so-called Additive Regularization of Topic Models (ARTM) approach developed in [21] received a powerful impulse. ARTM is a multicriteria approach based on the presentation of the topic modeling problem as an ill-posed optimization problem requiring the introduction of a regularizer - an additional criterion that takes into account the specific features of the applied problem or knowledge of the subject area [21]. Currently, two directions of development of topic models are outlined - based on Bayesian learning (LDA model) and on the basis of Additive regularization. [21] revise topic models previously developed in the Bayesian approach, for each of which a corresponding regularizer is found, which leads to the same or very similar model learning algorithm. Compared to the Bayesian approach, ARTM radically simplifies the inference of the algorithm and allows to combine regularizers in arbitrary combi- nations. Also, recent studies have shown the superiority of ARTM over LDA in terms of the quality of highlighted topics (see, for example, [23], where ARTM and LDA are compared using the example of monitoring ethnically determined discourse in social networks). The use of topic modeling for the analysis of patent data has been gaining populari- ty in recent years. Research publications in the field of patent analysis show the effec- tiveness of both the use of text mining methods in general [24] and the validity of applying topic modeling [25, 26]. The construction of topic models is used to get an idea of patenting in the industry [27], to identify technological trends [28], to develop individual specialized software products for conducting topic patent analysis [26]. 3 Materials and methods From 2015 to 2018 Minpromtorg of Russia has been approving the import substitu- tion programs [29] in the range of economic industries. The programs govern action plans for import substitution in 22 economic sectors (hereinafter - the Plans). In each of the industries, a list of goods and technologies has been compiled for which its own indicator of the share of imports by 2020 has been established. Simple statistics on the characteristics of the Plans are shown in Table 1. Table 1. Import substitution plans characteristics. Characteristic Value Number of plan items: Min.-Max. 4-602 Mean 70.3 Median 41 Sum 1553 Import share: Median (fact) 90% Median (plan) 15% Today it is important to present some results of import substitution based on the analysis of patent data. During the implementation of the Program, a number of publi- cations appeared in the scientific community evaluating the possibilities of import substitution for certain goods [30-32]. Despite the crucial importance of conducting a detailed analysis in each of the development areas, it is useful to have a general struc- ture of the results. The approach, covering all sectors at once, will allow both to demonstrate the results of the program as a whole and give a general idea of the state of various sectors of the economy (based on patent data). To analyze the implementation of the import substitution plan, it is necessary to obtain information on all 1553 points of the Plan, which requires a fundamentally different approach to the patent search. Currently, there are various approaches for building topic models [33, 34], [14]. In our work we focus on Additive Regularization of Topic Models as it provides a con- venient way for semi-supervised learning and greater flexibility in constructing topic models with given properties [8]. In this article we give only the basics of multi- modal ARTM, as it was done in [35]. Let us denote a finite collection of documents by D, a finite set of topics by T, and a finite set of modalities by M. In our work we use two modalities: words (unigrams) and most common bigrams. Each modality has a dictionary of tokens . Each document is a sequence of tokens from . According to the bag-of-words hypothesis we take into account how many times the token ap- pears in the document ( ). In ARTM topic modeling is considered as a special case of approximate stochastic matrix factorization. To learn a factorized representation of a text collection is an ill- posed problem, which has an infinite set of solutions. A typical regularization ap- proach in this case is to impose problem-specific constraints in a form of additive terms in the optimization criterion [36]. Given the matrix and find its approximate matrix factorization by matrix of token probabilities for the topics and matrix of topic probabilities for the documents: (1) where is a number of topics in the model (in our case =22). Additive regularization narrows the set of solutions of (1) by maximizing the weighted sum of modality log-likelihoods and regularizers : (2) under non-negativity and normalization constraints for all columns of and matrixes. Regularization coefficients are used to balance the importance of differ- ent modalities. This optimization problem can be solved using the EM-algorithm. At first, the ini- tial approximation for , is selected. At the E-step, auxiliary variables are calculated: (3) where operator transforms a real vector to a vector representing a discrete distribution (by zeroing out the negative elements and normalizing). At the M-step, , are specified: , (4) , (5) where is the modality of the term , . Calculating (3)-(5) continue in a loop until convergence. Regularizers are aimed at taking into account the linguistic features of the text and increasing the interpretability of topics. The most common regularizers are sparsing, smoothing, and decorrelation regularizers of topics [21]. For our task regularizers need to be constructed for grouping the terms of each of the Plans in its own topic. Thus, we use 22 smoothing regularizers for matrix (both for words and bigrams) that encourages terms from each Plan to appear in relat- ed topic , . (6) The convenience of ARTM is that regularization term yields a simple additive modification of the M-step. For our task, this modification led to the fact that the parameter was added to the frequencies of terms related to the terms from the “white list” (Plan terms) at each iteration of the EM algorithm. The value of is selected experimentally. The main task in constructing the model is to select the regularization strategy - function of the regularization coefficient on the iteration number and model quality criteria. Following [36] we use such quality criteria as perplexity (the degree of con- vergence of the model with a given dictionary W, ), the degree of sparseness of the matrices Φ, Θ (the proportion of zero elements in the ma- trix), the size of the kernel (many words with a high conditional probability, ), the purity of the topic (how much the terms inside the topic are determining - the total probability of the terms of the kernel of the topic ), the contrast of the topic (how well the topic kernel distin- The reported study was funded by RFBR according to the research project No. 20-07-22059 guishes it from the rest in that, i.e., the probability of meeting the terms of the kernel in this particular topic ). 4 Model construction and Results It was collected patents for inventions and utility models issued over a 3.5-year period (January 2016-June 2019) - a total of 152718 documents: 120768 inventions and 31950 utility models. For building the ARTM the Python and the open source library BigARTM were used [37]. The model was built on the basis of Titles and Abstracts of patents presented in the form of unigrams (i.e. single words) and most frequency bigrams (two-word phrases with a frequency of occurrence in the Title and Abstract of more than or equal to 2). The experimentally chosen modality weight was: 1.0 for words and 5.0 for bi- grams. The coefficient of 22 smoothing regularizers was . The final model had the following quality metrics: Perplexity is 630.7, the propor- tion of sparse elements in unigram matrix , in bigrams , . The kernel size is , the average purity is 0.992, and the average contrast is 0.976. Total number of iterations: 40. Based on the ranged probabilities of columns, we selected patent documents for each of 22 industries (threshold=0.6) in accordance with a topic characterized by a set of words and phrases from the corresponding Plan. In addition to standard automati- cally calculated metrics, the quality of the model was also evaluated using assessors, which determine how relevant the selected document is. The value q = 1 was set in accordance with the patent document if the patent exactly corresponded to one of the import substitution items declared in the Plan; q = 0.5 was assigned if the patent is associated with one of the points of the Plan; q = 0 - if it did not correspond to any of the items in the Plan. This technique has been successfully used in [8, 10]. For documents with values q = 1, q = 0.5, a key phrase was selected that character- izes document belonging to the item of the Plan (Table 2). Thus, each industry was characterized by import substitution categories (key phrases), total number of catego- ries (k), average and total mark ( ), and total score , where N - total points of the Plan. Industry ranking results are presented in Fig. 1. Table 2. Patentable import substitution categories. Industry Import substitution category (RU) Car industry internal combustion engine Civil aircraft industry - furniture for children; games and toys; sport complex- Baby goods es; baby clothes; children's creativity non-woven materials; protective clothing; wool pro- Light industry cessing Timber industry cellulose treatment; paper, cardboard Machinery for food and grain processing processing industry sterilization and disinfection; endoscopic devices; Medical industry injection needles; implantable pumps hydrotreating catalysts; drilling of the wells; hy- Oil and gas engineering drocracking catalysts; hydrocarbon processing; hy- draulic fracturing; catalytic cracking catalysts Conventional Arms Indus- cartridges; sports weapon try Electronic industry - Agricultural and forestry bearings; combine harvester; baler engineering milling machine; lathe; boring machine; spindles; Machine tool industry finish grinding; waterjet cutting; cnc machines Builds. materials and ceramic mass for tiles; thermal insulation materials; builds. construction crushed stone and mastic asphalt concrete road surface; hydraulic equipment; front loaders; bull- Road construction tech- dozers; front loader; excavator; trailer and semi- nique trailer; crane chassis; municipal engineering Shipbuilding industry mover; flange screw cistern wagon; brake system; wagon trolleys; covered Transport machine building wagon Heavy engineering support mountain; refrigeration units inosine + nicotinamide + riboflavin + succinic acid; bismuth potassium ammonium citrate; drotaverine; yohexol; lopinavir + ritonavir; ethyl methyl hydroxypyridine succinate; rocuronium bromide; di- Pharmaceutical industry goxin; 1 carbamoylmethyl 4 phenyl 2 pyrrolidone; fenspiride; isoniazid; lappaconitine hydrobromide; standard immunoglobulin; bromodihydrochlorophenylbenzodiazepine; desmopressin; fingolimod; anastrozole paints and varnishes; sealing materials; epoxy compo- site; adhesive materials; polyethylene terephthalate; Chemical industry ultra high molecular weight polyethylene; polymer composites aluminum alloy; aluminum, electrolysis; aluminum ligature; aluminum hydroxide; aluminum powder; Non-ferrous metallurgy aluminium foil; aluminum rods; anode mass refractories; tubing; threaded connections; drill pipes; Ferrous metallurgy pipes based on chromium-nickel alloys; casing Power engineering current transformers Fig. 1. Industry ranking results Thus, in the context of granted patents for inventions and utility models, industries that are demonstrated the best indicators of import substitution: Ferrous metallurgy, Baby goods, Road construction technique, Building materials and building construc- tions, Conventional Arms Industry, etc. Industries that are currently unable to comply with the import substitution plan (based on patent documents): Civil aircraft industry, Electronic industry, Power engi- neering, Shipbuilding industry, etc. 5 Discussion Intellectual property in the form of patents plays a vital role in today's economy. However, the constantly growing volume of information, including patent infor- mation, significantly complicates its effective monitoring and analysis. Currently, many search and analytical systems (for example, Yandex.Patents, Google Patents, Patseer) use the advanced achievements of computational linguistics, including the methods of text semantic analysis. Modern search engines are able to find similar patents, related patents (which mention one or another document of interest to the user, or other documents to which he refers). The search for similar patents is carried out not only by keywords, but also by meaning. It should be noted that modern patent search and analytics systems are designed to obtain information about objects one at a time. If there are many objects of interest, the search requires significant time invest- ment. To analyze the implementation of the import substitution plan, it is necessary to obtain information on all 1553 items of the Plan, which obviously requires a funda- mentally different approach to the implementation of patent search. The purpose of topic modeling of patent documents is to simplify access to docu- ments of interest from the perspective of import substitution. The constructed model allows you to get a general picture of the implementation of import substitution as a whole, within the considered time window. It is important that the resulting structure allows, if necessary, to detail the results. For example, to identify the share of individual patent holders who will not be able to become the main agents for capturing market niches and will not be able to compete with large foreign companies; share of non-valid patents, etc. This approach is a kind of “close-up” of patent search, which can serve both the final goal or a starting point for a more detailed analysis. 6 Conclusions The results demonstrate the effectiveness of the new patent search method based on topic modeling. The approach allows to search by blocks of a priori information (in our case, points from all twenty-two industrial import substitution plans at once) and, at the output, receive a selection of relevant documents for each of the industries. Applying the topic modeling also solves the problem of synonymy and homonymy of words. In today's constantly changing digital reality, the rate of information accumulation is so rapid that it requires to revise our approaches to semantic compression of infor- mation. In order to comprehensively cover and analyze the entire spectrum of ongoing changes, it is necessary to make increased demands on the methods of information retrieval. An innovative search approach must flexibly take into account the large amount of already accumulated knowledge and a priori requirements for results. The results, in turn, should immediately represent a roadmap of the studied direction with the possibility of as much detail as necessary. The topic modeling approach allows us to take into account all these requirements and thereby streamline the nature of work- ing with information, increase the efficiency of knowledge extraction, and avoid cog- nitive biases in the perception of information, which is important both at the micro and macro levels. Acknowledgments The reported study was funded by RFBR according to the research project No. 19- 010-00293. References 1. Milovidov, V. Hearing the sound of the wave: what makes it difficult to anticipate innova- tion? [In Russian]. Forsajt 12(1), 88–97 (2019) doi:10.17323/2500-2597.2018.1.88.97 2. Nedumov, YA.R., Kuznecov, S.D. Issledovatel'skij poisk nauchnyh statej [In Russian]. Trudy ISP RAN, 30 (6), 171-198 (2018) doi: 10.15514/ISPRAS-2018-30(6)-10. 3. Pagallo, U., Palmirani, M., Casanovas, P., Sartor, G., Villata, S. Introduction: Legal and Ethical Dimensions of AI, NorMAS, and the Web of Data. In: Pagallo, U., Palmirani, M., Casanovas, P., Sartor, G., Villata (Eds). Lecture Notes in Artificial Intelligence Springer (2018) doi: 10.1007/978-3-030-00178-0_1 . 4. Moretti, F. Distant reading. London: Verso (2013). 5. Gibson, Je., Dajm, T., Garses, Je., Dabich, M. Bibliometric analysis as a tool for identify- ing common and emerging methods of technological Foresight. Forsajt 12(1), 6-24 (2018) doi: 10.17323/2500-2597.2018.1.6.24 6. Halibas, A.S., Shaffi, A.S., Mohamed, M.A. Application of text classification and cluster- ing of Twitter data for business analytics. Majan International Conference (MIC), Muscat, 1-7 (2018) doi: 10.1109/MINTC.2018.8363162 7. Krishna, A., Aich, A., Akhilesh, V., Hegde, C. Analysis of Customer Opinion Using Ma- chine Learning and NLP Techniques. International Journal of Advanced Studies of Scien- tific Research 3(9), 128-132 (2018) https://ssrn.com/abstract=3315430 8. Apishev, M., Koltcov, S., Koltsova, O., Nikolenko, S., Vorontsov, K. Mining Ethnic Con- tent Online with Additively Regularized Topic Models. Computación y Sistemas 20(3), 387–403 (2016) doi: 10.13053/CyS-20-3-2473. 9. Sulea, O-M., Zampieri, M., Malmasi, S., Vela, M., Dinu, L.P., Genabith, J. Exploring the Use of Text Classification in the Legal Domain. In: Proceedings of the 2nd Workshop on Automated Semantic Analysis of Information in Legal Texts (ASAIL), London, United Kingdom (2017) arXiv:1710.09306v1. 10. Janina, A.O., Vorontsov, K.V. Multimodal topic models for exploratory search in a collec- tive blog [In Russian]. Mashinnoe obuchenie i analiz dannyh 2(2), 173-186 (2016) doi: 10.21469/22233792.2.2.04. 11. Grant, C.E., Clint P. G., Virupaksha, K., Nirkhiwale S., Wilson, J.N., Wang, D.Z.: A Top- ic-Based Search, Visualization, and Exploration System. In: FLAIRS Conference, pp. 43- 48. AAAI Press, Massachusetts (2015). 12. Eisenstein, J., Chau, D.H., Kittur, A., Xing, E.P.: TopicViz: interactive topic exploration in document collections. In: Proceeding of CHI EA'12, pp. 2177-2182. Association for Com- puting Machinery, New York, NY, USA (2012) doi: 10.1145/2212776.2223772. 13. Hofmann,T. Probabilistic Latent Semantic Analysis. Uncertainty in Artificial Intelligence, UAI'99, Stockholm (1999) doi: 10.1145/312624.312649. 14. Daud, A., Li, J., Zhu, L., Muhammad, F.: A generalized topic modeling approach for ma- ven search. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, QM. (eds) Advances in Data and Web Management. APWeb/WAIM 2009. LNCS, vol 5446, pp. 138-149. Spring- er, Heidelberg (2009) doi: 10.1007/978-3-642-00672-2_14. 15. Blei, D., Ng, A., and Jordan, M. Latent Dirichlet allocation. Journal of Machine Learning Research, 3 (2003) doi: 10.1162/jmlr.2003.3.4-5.993. 16. Chemudugunta C., Smyth P., and Steyvers M. Modeling general and specific aspects of documents with a probabilistic topic model. In: Advances in Neural Information Pro- cessing Systems. - MIT Press, Vol. 19, 241–248 (2006). 17. Mimno, D., Wallach, H.M., Naradowsky, J., Smith, D.A., McCallum, A. Polylingual Top- ic Models. Proceedings of the 2009 Conference on Empirical Methods in Natural Lan- guage Processing, Singapore, 880–889 (2009). 18. Ramage, D. Hall D., Nallapati R., and Manning, C.D. (2009) Labeled LDA. A supervised topic model for credit attribution in multi-labeled corpora. Proceedings of the 2009 Con- ference on Empirical Methods in Natural Language Processing, 248–256 (2009). 19. Blei, D. M. and Lafferty, J. Dynamic topic models. In: Proceedings of 23rd International Conference on Machine Learning (ICML), Pittsburgh, Pennsylvania, USA (2006) doi: 10.1145/1143844.1143859. 20. Wang, C., Blei, M. D. and Heckerman, D. Continuous time dynamic topic models. In: Pro- ceedings of Uncertainty in Artificial Intelligence (UAI), Helsinki, Finland (2008) arXiv:1206.3298. 21. Vorontsov, K.V., Potapenko A. A. Additive Regularization of Topic Models. Machine Learning Journal, Special Issue "Data Analysis and Intelligent Optimization", 1-21 (2014) doi: 10.1007/s10994-014-5476-6 . 22. Potapenko, A. A., Vorontsov, K. V. Robust PLSA Performs Better Than LDA. 35th Euro- pean Conference on Information Retrieval, ECIR-2013, Moscow, Russia, 24-27 March 2013. —Lecture Notes in Computer Science (LNCS), Springer Verlag-Germany, 784–787 (2013) doi: 10.1007/978-3-642-36973-5_84. 23. Apishev,M., Koltsov S., Koltsova, O., Nikolenko, S., and Vorontsov, K. Additive Regular- ization for Topic Modeling in Sociological Studies of User-Generated Texts. Conference Paper in Lecture Notes in Computer Science (2017) doi: 10.1007/978-3-319-62434-1_14 . 24. Tseng, Y-H., Lin, C-J. Text mining techniques for patent analysis. Information Processing & Management 43, 1216-1247 (2007) doi: 10.1016/j.ipm.2006.11.011. 25. Chen., L., Shang, W., Yang, G., Zhang, J., Lei, X. A topic model integrating patent classi- fication information for patent analysis. Geomatics and Information Science of Wuhan University 41, 123-126 (2016). 26. Tang, J., Wang, B., Yang, Y., Hu, P., Zhao, Y., Yan, X., Gao, B., Huang, M., Xu, P., Li, W., Usadi, A.k.: PatentMiner: Topic-driven Patent Analysis and Mining. In: Proceedings of KDD’12, pp. 1366-1374. Beijing, China (2012) doi: 10.1145/2339530.2339741 . 27. Suominen, A., Toivanen, H., Seppänen, M. Firms' knowledge profiles: Mapping patent da- ta with unsupervised learning. Technological Forecasting and Social Change 115, 131-142 (2017) doi: 10.1016/j.techfore.2016.09.028 . 28. Choi, D., Song, B. Exploring Technological Trends in Logistics: Topic Modeling-Based Patent Analysis. Sustainability 10(8), 1-26 (2018) doi: 10.3390/su10082810. 29. Ministry of Industry and Trade of Russia, Sectoral plans for import substitution in twenty- two industries, https://gisp.gov.ru/plan-import-change/, last accessed 2020/10/10. 30. Jerivanceva T. N. The use of patent analysis to assess the prospects of import substitution on the example of domestic retractors and crosslinking products [In Russian]. Jekonomika nauki 4, 261-275 (2016) doi: 10.22394/2410-132X-2016-2-4-261-275. 31. Jerivanceva, T.N. Assessment of the competitiveness of Russian scientific and technologi- cal backlogs in the field of creating medical instruments [In Russian]. Jekonomika nauki 1, 53-69 (2017) doi: 10.22394/2410-132X-2017-3-1-52-68. 32. Andrejchikov, A.V., Teveleva, O.V., Nevolin, I.V., Milkova M.A., Kravchuk, I.S. Meth- odology for conducting search research to identify opportunities for import substitution of high-tech products based on world patent and financial information resources [In Russian]. Jekonomika i predprinimatel'stvo 4, 157-167 (2019). 33. Milkova, M.A. Topic models as a tool for distance reading [In Russian]. Cifrovaja jekonomika 1(5), 57-69 (2019) doi: 10.34706/DE-2019-01-06 . 34. Boyd-Graber, J., Hu, Y., Mimmo, D. Applications of Topic Models. Foundations and Trends in Information Retrieval 11(2-3), 143-296 (2017) doi: 10.1561/1500000030. 35. Ianina, A., Golytsyn, L., Vorontsov, K.: Multi-objective topic modeling for exploratory search in tech news. In: Filchenkov, A., Pivovarova, L., Žižka, J. (eds) Artificial Intelli- gence and Natural Language. AINL 2017. Communications in Computer and Information Science, vol 789, pp.181-193. Springer, Cham (2017) doi: 10.1007/978-3-319-71746- 3_16. 36. Vorontsov, K., Frei, O. Apishev, M., Romov, P., Suvorova, M., Yanina. A.: Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large Collections. In: Pro- ceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications (TM ’15), pp. 29–37. Association for Computing Machinery, New York, USA (2015) doi: 10.1145/2809936.2809943. 37. Frei, O., Apishev, M.: Parallel non-blocking deterministic algorithm for online topic mod- eling. In: Ignatov, D. et al. (eds) Analysis of Images, Social Networks and Texts. AIST 2016, Communications in Computer and Information Science, vol 661, pp. 132-144. Springer, Cham (2016) doi: 10.1007/978-3-319-52920-2_13.