<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Research and development of linguo-statistical methods for forming a portrait of a subject area</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Oleg</forename><forename type="middle">V</forename><surname>Zolotarev</surname></persName>
							<email>ol-zolot@yandex.ru</email>
							<affiliation key="aff0">
								<orgName type="institution">ANO HE «Russian New University»</orgName>
								<address>
									<settlement>Moscow</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Research and development of linguo-statistical methods for forming a portrait of a subject area</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">F8F6D03533681F2A67E4DC4FB8295A97</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T01:45+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The project aims to solve the fundamental scientific problem of semantic modeling, within the framework of which a methodology is developed for the automated identification of translation links (translation correspondences), as well as hierarchical, synonymous and associative links from Internet texts and the construction of multilingual associative hierarchical portraits of subject area (MAHPSA), in particular, on autonomous uninhabited underwater vehicles (UUV). Accounting for multilingual and heterogeneous resources allows you to get a more complete picture of what is happening in the subject area, to identify the sources of the origin of ideas, the speed and directions of their distribution, to identify significant documents and promising directions. The solution to the problem is based on an integrated approach that combines the methods of statistics, corpus linguistics and distributive semantics, and is implemented in technology that involves the development of linguo-statistical mechanisms for the formation of a multilingual associative hierarchical portrait of a subject area, which is a dictionary of significant terms of the subject area, the elements of which organized in synonymous series (synsets), including translational correspondences, as well as associative and hierarchical relationships.</p><p>Keywords: Linguo-statistical methods, associative and hierarchical portrait of the subject area, multilingual integrated ontology, forecasting the spread of ideas, multilingual body of the subject area.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The growth of volumes on the Internet significantly complicates the search for information. Using semantic search, comparing multilingual documents will allow you to find new interesting trends and ideas, which will significantly reduce the cost of developing and popularizing new areas in science. Using a multilingual associative hierarchical portrait of a subject area when comparing documents will allow us to compare texts not only on the basis of matching phrases included in these documents, but also on the matching of the described objects and processes. MAHPSA allows you to determine the semantic similarity of documents even if the documents do not have common words that are included in both documents. MAHPSA allows you to calculate the integrated statistics of a multilingual collection, determine significant documents and promising areas without translating documents into one of the languages. This is important for the automatic processing of a large number of documents (Big Data). The construction of MAHPSA will provide an opportunity not only to compare documents and search for new ideas, but also to solve other problems associated with the rapid analysis of a large amount of information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Technique of automatic formation of a multilingual associative-hierarchical portrait of a subject area</head><p>The essence of the proposed method for the formation of a multilingual associative-hierarchical portrait of a subject domain consists in iteratively expanding the initial multilingual dictionary of significant phrases to the hierarchy of multilingual synonymous series (synsets). The method can be stated as the following algorithm: 1) Compiling a collection of multilingual texts by means of a directed search in the databases of scientific documents (for example, Dimensions) by keywords; 2) Word processing by means of the Pullenti program, tokenization and metatoke nization; 3) Automatic generation of glossaries of terms and megalemms; expert quality control of generated dictionaries;</p><p>4) Automatic selection of topics on the basis of thematic modeling methods, the formation of a dictionary of subject areas, the selection of many keywords of subject areas, expert control, topic correction; 5) The formation of a dictionary of key terms mapped to topics; 6) Compilation of frequency dictionaries of domain terms (using statistical methods); 7) Compilation of frequency dictionaries of subject domain megalemmas; 8) Building multilingual synsets by combining BabelNet resources and a megalemma dictionary; 9) Building SVPs using a neural network model (a combination of Word2Vec with multilingual recurrent neural networks RNN) for texts that have undergone preprocessing; 10) Performing hierarchical clustering using Word2Vec and RNN, taking into account the hierarchical relationships of synsets; 11) The construction of an ordered list of candidates for hierarchical relationships from associative connections of the neural network model; viewing and correction of hierarchical relations is implemented on the basis of the Keywen Knowledge Architect resource <ref type="bibr" target="#b0">[1]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology for calculating integral statistics based on MAHPSA</head><p>MAHPSA is created automatically on the basis of statistical analysis of large volumes of texts from the Internet. The hierarchical connections that make up the MAHPSA form a hierarchy and classifier that facilitate the search and navigation in the multilingual subject area of the UUV.</p><p>The proposed methodology also includes the integration of various MAHPSA s with multilingual linguistic resources (WordNet, Wikipedia, BabelNet, etc.) to obtain the largest multilingual ontology with relevant knowledge and improved coverage of terminology in the subject areas under consideration. The combined (integral) ontology contains a hierarchy of synonymic series (synsets) of multilingual terms, including Russian, and serves as the basis for constructing a single multilingual vector space that allows us to evaluate the semantic proximity of multilingual texts, synsets and terms, similar to NASARI and MAFFIN methods. The translation correspondences between the multilingual synsets of MAHPSA are built using Word2Vec technology. Integral ontology allows you to calculate integrated multilingual statistics and trends in the use of terms and ideas, which allows you to predict the distribution of ideas between languages and determine promising directions. A measure of the semantic proximity of multilingual documents allows you to identify implicit links between documents and determine significant documents, which is necessary to collect high-quality information from the open Internet and build large relevant multilingual corpuses of texts for the subject area. Thus, increasing the size and quality of integral ontology will allow us to build a better similarity measure and subject corpus of texts, extracting knowledge from which in turn will further increase the size and quality of integral ontology.</p><p>The methodology includes not only the identification of significant documents, but also the identification of trends and the identification of promising areas for the development of science.</p><p>To develop the first version of the integrated statistics methodology based on MAHPSA, it is necessary to do the following: 1) Conduct morphological, syntactic and partially semantic analysis of the text; 2) Select typed objects -named entities; 3) Identify formal elements for the presentation of concepts; 4) Develop a structure and software for storing a multilingual collection of documents; 5) Create dictionaries for storing structured information; 6) Develop neural network algorithms for calculating integrated statistics based on MAHPSA.</p><p>The first version of the program has been developed for highlighting interlingual implicit connections and assessing the semantic similarity of phrases in different languages.</p><p>Text processing is carried out using the program PullEnti <ref type="bibr" target="#b1">[2]</ref>. This is a unique product that wins the computer linguistics competitions held as part of the Dialogue conference.</p><p>Pullenti is a linguistic processor developed at the Institute of Informatics Problems, which is constantly being refined and allows morphological, syntactic and partially semantic analysis of the text, distinguishing typed objects -named entities.</p><p>Pullenti SDK includes the following main blocks: 1) Tokenization: breakdown into words (tokens) as adjusted (Fig. <ref type="figure">1</ref> [2-12]); 2) Morphological analysis: definition for tokens of parts of speech (this is a POS-tagger -Part of Speech, which gives out all possible options for a word form regardless of its surrounding context). Languages are Russian, Ukrainian and English. There is normalization, reduction of the word form to the desired case \ gender \ number, and there is also processing of unknown and new words, and there is also a mode for correcting errors (Fig. <ref type="figure">2</ref> [2-12]); 3) Selection of named entities <ref type="bibr" target="#b12">[13]</ref> (NER -Names Entity Recognition): a lot of so-called analyzers that find entities of the corresponding type (person, organization, geographical objects, etc.) in sequences of tokens (Fig. <ref type="figure">3</ref> [2-12]); 4) A lot of tools for working with numerical data, nominal and verb groups, brackets and quotation marks, dictionaries of terms and abbreviations, various checks (for example, equivalence of strings in Latin and Cyrillic letters) and other useful features that appeared during the solution of practical problems (Fig.</p><p>. <ref type="bibr">4 [2-12]</ref>); 5) Derivative dictionary: a dictionary of the so-called derivative groups (many same-root words, but different parts of speech, and one group contains words in different languages), group management model (what can come after a group), synonymy, etc.; 6) Semantic representation: tokens are structured in the form of a graph with semantic connections to solve more complex problems related to meaning <ref type="bibr" target="#b13">[14]</ref>. Specially for this project, the linguistic processor has been modified so that it is possible to more accurately highlight implicit links in documents</p><p>The concept of a token (Token base class) is at the heart of the Pullenti SDK model. Each token refers to a merged fragment of the source text (BeginChar and EndChar positions). First, the text is divided into a sequence of text tokens (TextToken), and then during processing they are converted -merging into meta-tokens (MetaToken). A metatoken is a token that has "absorbed" a fused sequence of other tokens. Metatokens, for example, represent places of occurrence of named entities (ReferentToken) in the text. Metatokens can represent various numerical data (lowercase spelling of numbers), name groups (in the example, NounPhraseToken is the inherited class from MetaToken), etc. Most of the elements received and used during the analysis are metatokens.</p><p>The concept of PullEnti megatokens served as the basis for building dictionaries of megalemmas, each of which can consist of several tokens or megatokens. The megalemma is the basis for comparing meaningful phrases from different languages, i.e. the concept of megalemma is broader than the concept of megatoken, since it additionally includes identifying connections between different languages.</p><p>Megalemma dictionaries are constructed using the method for determining the proximity of terms <ref type="bibr" target="#b10">[11]</ref>. It is this method that allows us to form megalemmas on the basis of statistical patterns of occurrence of terms in the framework of the formation of an associative-hierarchical portrait of a subject area.</p><p>Thematic dictionaries of megalemmas are formed by subject areas and serve as the basis for the classification of texts. Megalemma dictionaries are also used to represent knowledge in ontologies and automatically supplement them with relevant vocabulary.</p><p>The formal element for the presentation of concepts was chosen synset. This is the basis of knowledge representation in systems such as Wordnet, Babelnet and others. This is a well-established and generally accepted concept <ref type="bibr" target="#b14">[15]</ref>. Synsets can chain together (megalemmas include synsets).</p><p>Thus megalemmas are presented -these are chains of synsets. The concept of synset is initially oriented toward multilingualism.</p><p>The work was carried out in two subject areas -"computer graphics and visualization" and "autonomous uninhabited underwater vehicles".</p><p>Algorithms for the semantic analysis of information have been developed <ref type="bibr" target="#b1">[2]</ref><ref type="bibr" target="#b2">[3]</ref><ref type="bibr" target="#b3">[4]</ref><ref type="bibr" target="#b4">[5]</ref><ref type="bibr" target="#b5">[6]</ref><ref type="bibr" target="#b6">[7]</ref><ref type="bibr" target="#b7">[8]</ref><ref type="bibr" target="#b8">[9]</ref><ref type="bibr" target="#b9">[10]</ref><ref type="bibr" target="#b10">[11]</ref><ref type="bibr" target="#b14">15]</ref>. Prototypes of software components for semantic analysis of textual information have been developed too.</p><p>Implicit links are searched using the megalemma dictionary. First, the text is processed using the PullEnti program, normalization of words in the text, selection of named entities (NER -named entity recognition), formation of dictionaries of tokens and megatokens for the text are performed. Next, a thematic analysis of the text is carried out using megalemma dictionaries. In the dictionaries of megalemmas, as already mentioned, there is a correlation of each megalemma with a specific document and with a specific subject area. This allows the classification of texts in subject areas and a statistical analysis of documents for the presence of implicit references. According to the publication date of the document, the source document of the megalemma and the document that has a link to the megalemma are determined.</p><p>To control the quality of automatic detection of implicit links, methods of collective intelligence and crowdsourcing were used <ref type="bibr" target="#b16">[17]</ref>. It was proposed to conduct a quality check for the detection of implicit links using an expert approach.</p><p>The probability of a positive decision is determined by the mathematical model:</p><formula xml:id="formula_0">𝑲𝑲 𝟎𝟎 = � 𝑪𝑪 𝑴𝑴 𝒊𝒊 𝑮𝑮 𝑹𝑹 𝑴𝑴=𝒊𝒊 (𝟏𝟏 − 𝑮𝑮 𝑹𝑹 ) 𝒊𝒊 𝑴𝑴=𝟏𝟏/𝟐𝟐 𝒊𝒊=𝟎𝟎</formula><p>In accordance with this formula, the probability K0 of a positive decision by a group of M experts with the probability of the correct GR solution for one expert is determined by this formula. The analysis of expert estimates showed a rather high level of revealing implicit links and determining the semantic similarity of phrases and documents.</p><p>There was developed software for storing a multilingual collection of documents. A software implementation of thematic modeling methods using dictionaries of megalemmas in subject areas has been developed <ref type="bibr" target="#b17">[18]</ref>.</p><p>As a result of processing collections of documents, dictionaries of terms and dictionaries of megalemmas are built. Statistics is collected for the use of terms and megalemmas by articles.</p><p>BabelNet is an integration resource based on the following resources: WordNet, Wikipedia, OmegaWiki, Wiktionary, Wikidata, Wikiquote, VerbNet, Microsoft Terminology, GeoNames, ImageNet, FrameNet, WN-Map, Open Multilingual WordNet, WoNeF, Albanet, Arabic WordNet ( AWN v2), BulTreeBank WordNet (BTB-WN), Chinese Open WordNet, Chinese WordNet (Taiwan), DanNet, Greek WordNet, Princeton WordNet, Persian WordNet, FinnWordNet, WOLF (WordNet Libre du Français), Hebrew WordNet, Croatian WordNet, IceWordNet , MultiWordNet, ItalWordNet, Japanese WordNet, Multilingual Central Repository, WordNet Bahasa, Open Dutch WordNet, Norwegian WordNet, plWordNet, OpenWN-PT, Romanian WordNet, Lithua.</p><p>BabelNet is fully integrated with BabelFly's multilingual lexical ambiguity and entity binding system. BabelNet is also integrated with Wikipedia's bitaxonomy <ref type="bibr" target="#b19">[20]</ref>, which is built around two hierarchies: page hierarchies and category hierarchies <ref type="bibr" target="#b14">[15]</ref>.</p><p>Integration with BabelNet will be carried out by analogy with the approach that BabelNet uses to integrate with other (described above) resources, using automatic display and filling of lexical gaps in languages with limited resources using statistical machine translation. The result is an "encyclopedic dictionary" that provides concepts and named entities lexicalized in many languages and associated with a large number of semantic relations <ref type="bibr" target="#b20">[21]</ref>. Additional vocabulary and definitions are added by reference to free networks such as WordNet, OmegaWiki, English Wiktionary, Wikidata, FrameNet, VerbNet and others. Like WordNet, BabelNet groups words in different languages into sets of synonyms called Babel synsets. For each Babel syntax, BabelNet provides short definitions (called glosses) in many languages, taken from both WordNet and Wikipedia.</p><p>In the future, it is planned to use the Babelscape product <ref type="bibr" target="#b21">[22]</ref>, which allows us to analyze documents, perform semantic markup of texts, build semantic knowledge graphs in several languages, etc., but this issue requires additional careful study <ref type="bibr" target="#b14">[15]</ref>.</p><p>The dictionaries of terms and megalemmas proposed within the framework of the project allow not only to classify texts, but also to define implicit links between articles.</p><p>The structure of the glossary is represented by a tuple:</p><formula xml:id="formula_1">Dterm = &lt; IDterm, Term&gt;,<label>(1)</label></formula><p>where Dterm is a glossary of terms, IDterm is a term identifier in a dictionary, Term is a term.</p><p>The structure of the megalemma dictionary is represented by a tuple:</p><formula xml:id="formula_2">Dmeg = &lt; IDmeg, MegL&gt;, (2)</formula><p>where Dmeg is the megalemma dictionary, IDmeg is the megalemma identifier in the dictionary, MegL is the megalemma.</p><p>The structure of the document dictionary is represented by a tuple: Ddoc = &lt;IDdoc, NAMEdoc, SRCdoc, YEARdoc, NUMwrd&gt;, <ref type="bibr" target="#b2">(3)</ref> where Ddoc is the document dictionary, IDdoc is the document identifier in the dictionary, NAMEdoc is the document name, SRCdoc is the publication source, YEARdoc is the publication year, NUMwrd is the total number of terms in the document.</p><p>The structure of the domain dictionary is represented by a tuple:</p><formula xml:id="formula_3">Dsa = &lt; IDsa, SA&gt;,<label>(4)</label></formula><p>where Dsa is the domain dictionary, IDsa is the domain identifier in the dictionary, SA is the domain name.</p><p>While the Dterm dictionary is a general glossary of terms, dictionaries of documents contain the terms of the document and the frequency of occurrence of the term in the document. The same thing applies to the dictionary of megalemmas. These two dictionaries are associative tables in the database. An associative table in the database implements a relationship between many-to-many entities.</p><p>The structure of the dictionary of terms of the document is represented by a tuple: Dtd = &lt; IDterm, IDdoc, Fterm&gt;, <ref type="bibr" target="#b4">(5)</ref> where Dtd is the dictionary of terms of the document, Fterm is the relative frequency of occurrence of the term in the document, calculated as follows: first, all insignificant words are removed from the document (stop words, rare words, etc.), only the terms remain, then the frequency of occurrence of the term is divided by the total number of terms in the document. The structure of the dictionary of megalemmas of the document is represented by a tuple: Dmd = &lt; IDmeg, IDdoc, Fmeg&gt;, <ref type="bibr" target="#b5">(6)</ref> where Dmd is the dictionary of megalemmas in the document, Fmeg is the relative frequency of megalemma in the document, calculated as follows: the frequency of megalemma is divided by the total number of megalemmas in the document.</p><p>The structure of the keyword dictionary is represented by a tuple: Dkeywrd = &lt; IDterm, IDsa&gt;, (7) Keywords are taken from a general vocabulary of terms and compared with the subject area. This is also an associative table.</p><p>The structure of the dictionary of document correlation with a subject area is presented below. Ddsa = &lt;IDdoc, IDsa&gt;, <ref type="bibr" target="#b7">(8)</ref> where Ddsa is a dictionary of subject areas of a document. One document can belong to several subject areas.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head><p>A program was developed to implement methods for modeling topics and to identify implicit links between documents <ref type="bibr" target="#b22">[23]</ref>. The megalemmas' dictionary is used to determine implicit references. The task is to determine the source of the megalemma and link to it. A storage structure and methods for constructing a multilingual collection of synsets -synonymous series are developed.</p><p>A neural network algorithm was developed using tags / tokens (flagging) and the Word2vec method modified by the team of authors, already described, to identify Russianspeaking terms in texts that are similar in context of lexical meaning <ref type="bibr" target="#b23">[24]</ref>.</p><p>The methodology for constructing forecasts for the development of new directions includes the ratio of the relative frequencies of occurrence of the same megalemmas calculated over adjacent years. This approach eliminates the problem of retraining neural networks in connection with the accumulation of information.</p><p>The analysis of clustering methods and thematic modeling to assess the quality / significance of texts carried out <ref type="bibr" target="#b25">[25]</ref>. Various thematic modeling methods are considered, including the vector model, latent semantic analysis, latent Dirichlet placement, and others. The basis of these methods is a probabilistic approach, i.e. correlation of a term or document with several topics with a certain degree of probability. The disadvantage of this approach is the automatic formation of a list of topics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>As a result of this scientific research, a number of results will be obtained that have high scientific and applied significance: 1. The updated actual multilingual collection of scientific texts in various languages, containing more than 60 thousand scientific documents and having more than 6 thousand internal bibliographic references. This collection will allow us to accurately calculate the significance of documents using the scientific citation index (SCI) by the number of bibliographic references, as well as using the context scientific citation index (CSCI), calculated by the number of implicit references identified through the semantic similarity of texts. 2. The developed technique for the automatic formation of a multilingual associative-hierarchical portrait of a subject area (MAHPSA) containing a hierarchy of multilingual synonymous series (synsets). With the help of MAHPSA, it is possible to solve a wide range of problems, including calculating the semantic similarity of texts, identifying multilingual plagiarism, expanding queries in multilingual search. 3. The developed methodology and algorithms for calculating integrated multilingual statistics based on MAHPSA, including the identification of significant documents, trends and promising areas. Because of applying the technique to a multilingual collection, new concepts will be revealed, the dynamics of their development over time will be considered, and promising areas for the development of the subject area will be constructed. Based on this, it will be possible to build forecasts of promising areas of research. 4. The developed methodology for integrating MAHPSA with other ontologies and linguistic resources, including BabelNet, which contains millions of multilingual synsets. As a result, the shortcomings of BabelNet related to the low level of coverage of Russian terms will be overcome. For integrated resources, updated ratings of the significance of documents will be calculated and updated forecasts of promising areas of research in selected subject areas will be constructed.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .Fig. 2 .Fig. 3 .</head><label>123</label><figDesc>Fig. 1. Tokenization</figDesc><graphic coords="3,53.85,53.28,506.35,655.60" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="5,101.18,53.28,392.37,268.35" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="5,117.38,358.46,360.60,353.98" type="bitmap" /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgment</head><p>The reported study was funded by RFBR according to the research projects № 18-07-00225, 18-07-00909, 18-07-01111, 19-07-00455 and 20-04-60185.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Galbraith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Thayer</surname></persName>
		</author>
		<idno>draft-ietf-secsh-publickeyfile-01</idno>
		<title level="m">SECSH Public Key File Format</title>
				<imprint>
			<date type="published" when="2001-03">March 2001</date>
		</imprint>
	</monogr>
	<note>work in progress material</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">PullEnty system -information extraction from natural language texts and automated building of information systems</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">V</forename><surname>Zolotarev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Sharnin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">V</forename><surname>Klimenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">I</forename><surname>Kuznetsov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">the collection: Situational centers and information-analytical systems of class 4i for monitoring and security tasks (SCVRT2015-16</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="28" to="35" />
		</imprint>
	</monogr>
	<note>Proceedings of the International Scientific Conference: in 2 volumes</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The principles of constructing models of business processes in the subject area based on natural language text processing</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">V</forename><surname>Zolotarev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">B</forename><surname>Kozerenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Sharnin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Complex systems: models, analysis and control</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="82" to="88" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Methods and tools for domain modeling</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">V</forename><surname>Zolotarev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">the collection: The Civilization of Knowledge: Problems and Prospects of Social Communications Proceedings of the XIII International Scientific Conference</title>
				<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="71" to="72" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Project management</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">P</forename><surname>Zolotareva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">V</forename><surname>Yashkova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">V</forename><surname>Zolotarev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Educational-methodical manual / Nizhny Novgorod</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Formalization of knowledge about the subject area based on the analysis of natural language structures. In the collection: The civilization of knowledge: the problem of man in science of the XXI century</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">V</forename><surname>Zolotarev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the XII International Scientific Conference</title>
				<meeting>the XII International Scientific Conference</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="78" to="80" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Methods of extracting knowledge from natural language texts and building business process models based on the allocation of processes, objects, their relationships and characteristics</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">V</forename><surname>Zolotarev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Sharnin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Scientific Conference CPT2014</title>
				<meeting>the International Scientific Conference CPT2014</meeting>
		<imprint>
			<publisher>Institute of Computing for Physics and Technology</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="92" to="98" />
		</imprint>
	</monogr>
	<note>the collection</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Extracting and processing knowledge from unstructured texts of the business sphere and social networks</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Sharnin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">V</forename><surname>Zolotarev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">V</forename><surname>Somin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">the collection: Social computing: fundamentals, development technologies, social and humanitarian effects Materials of the Fourth International Scientific and Practical Conference</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="364" to="371" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Analytical intelligence based on the analysis of unstructured information from various sources, including the Internet and the media</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">V</forename><surname>Zolotarev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">B</forename><surname>Kozerenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Sharnin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Series: Complex systems: models, analysis and control</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="49" to="54" />
		</imprint>
		<respStmt>
			<orgName>New University</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">New approaches in constructing the functional structure of the subject area</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">V</forename><surname>Zolotarev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">the collection: Twenty Years of Post-Soviet Russia: crisis phenomena and modernization mechanisms materials of the XIV All-Russian Scientific and Practical Conference of the Humanitarian University: in 2 volumes</title>
				<meeting><address><addrLine>Ekaterinburg</addrLine></address></meeting>
		<imprint>
			<publisher>Humanitarian University</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="639" to="643" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">A semantic approach to the analysis of terrorist activity on the Internet based on thematic modeling methods</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">V</forename><surname>Zolotarev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Sharnin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">V</forename><surname>Klimenko</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Bulletin of the Russian New University</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">V</forename><surname>Zolotarev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Sharnin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">V</forename><surname>Klimenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Series: Complex systems: models, analysis and control</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="64" to="71" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Semantic processing of unstructured textual data based on the linguistic processor PullEnti Informatics and applications</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">B</forename><surname>Kozerenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">I</forename><surname>Kuznetsov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Romanov</surname></persName>
		</author>
		<idno type="DOI">10.14357/19922264180313</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="91" to="98" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Named entity recognition with bidirectional lstm-cnns</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Chiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Nichols</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1511.08308</idno>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Peters</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1802.05365.-2018</idno>
		<title level="m">Deep contextualized word representations</title>
				<imprint/>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic net-work</title>
		<author>
			<persName><forename type="first">Roberto</forename><surname>Navigli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Simone</forename><surname>Paolo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ponzetto</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">193</biblScope>
			<biblScope unit="page" from="217" to="250" />
			<date type="published" when="2012">2012a</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Semantic Web Programming</title>
		<author>
			<persName><forename type="first">John</forename><surname>Hebeler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Matthew</forename><surname>Fisher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ryan</forename><surname>Blace</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrew</forename><surname>Perez-Lopez</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2009">2009</date>
			<publisher>-John Wiley &amp; Sons</publisher>
			<biblScope unit="page">648</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Methods for finding solutions by a group actor with a low probability of error. In the collection of CPT2019</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">I</forename><surname>Protasov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">E</forename><surname>Potapova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">O</forename><surname>Mirakhmedov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Sharnin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">B</forename><surname>Minasyan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Materials of the international scientific conference of the Nizhny Novgorod State University of Architecture and Civil Engineering and the Scientific and Research Center for Information in Physics and Technique</title>
				<meeting><address><addrLine>Nizhny Novgorod</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="284" to="291" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">RDF vocabulary description language 1.0: RDF schema W3C working draft</title>
		<author>
			<persName><forename type="first">D</forename><surname>Brickley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">V</forename><surname>Guha</surname></persName>
		</author>
		<ptr target="http://www.w3.org/TR/2002/WD-rdf-schema-20020430/" />
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Representing Multilingual Data as Linked Data: the Case of BabelNet 2.0</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ehrmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Cecconi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Vannella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cimiano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Navigli</surname></persName>
		</author>
		<idno>- 2014</idno>
		<ptr target="http://wwwusers.di.uniroma1.it/~navigli/pubs/LREC_2014_Ehrmannetal.pdf" />
		<imprint>
			<date type="published" when="2014">2014</date>
			<publisher>LREC</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project</title>
		<author>
			<persName><forename type="first">T</forename><surname>Flati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Vannella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Pasini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Navigli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014)</title>
				<meeting>of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014)<address><addrLine>Baltimore, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">June 22-27, 2014</date>
			<biblScope unit="page" from="945" to="955" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">A tool for effective extraction of synsets and semantic relations from BabelNet</title>
		<author>
			<persName><forename type="first">D</forename><surname>Ustalov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Panchenko</surname></persName>
		</author>
		<idno type="DOI">10.1109/SSDSE.2017.8071954</idno>
		<ptr target="https://doi.org/10.1109/SSDSE.2017.8071954" />
	</analytic>
	<monogr>
		<title level="m">Proceedings -2017 Siberian Symposium on Data Science and Engineering</title>
				<meeting>-2017 Siberian Symposium on Data Science and Engineering<address><addrLine>SSDSE</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017. 2017</date>
			<biblScope unit="page">8071954</biblScope>
		</imprint>
		<respStmt>
			<orgName>Institute of Electrical and Electronics Engineers Inc</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">BabelNetXplorer: a platform for multilingual lexical knowledge base access and exploration</title>
		<author>
			<persName><forename type="first">R</forename><surname>Navigli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">P</forename><surname>Ponzetto</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Companion Volume totheProceedings of the 21st World Wide Web Conference</title>
				<meeting><address><addrLine>Lyon, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2012-04-20">16-20 April 2012</date>
			<biblScope unit="page" from="393" to="396" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Best Topic Word Selection for Topic Labelling</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Lau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Newman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Karimi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Baldwin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">COLING&apos;10 Proceedings of the 23rd International Conference on Computational Linguistics</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m">Association for Computational Linguistics</title>
				<meeting><address><addrLine>Stroudsburg, PA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="605" to="613" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<ptr target="https://cloud.google.com/ml-engine/docs/tutorials/python-guide" />
		<title level="m">Google Cloud Machine Learning</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<title level="m" type="main">Integrating document clustering and topic modeling</title>
		<author>
			<persName><forename type="first">Xie</forename><surname>Pengtao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xing</forename><surname>Eric</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename></persName>
		</author>
		<idno type="arXiv">arXiv:1309.6874.2013</idno>
		<editor>Zolotarev Oleg V., Ph.D., Docent</editor>
		<imprint>
			<pubPlace>Moscow, Russia</pubPlace>
		</imprint>
		<respStmt>
			<orgName>ANO HE «Russian New University</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">arXiv preprint</note>
	<note>About the autors. ol-zolot@yandex</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
