<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Automatic smart subword segmentation for the reverse Ukrainian physical dictionary task</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Maksym</forename><surname>Vakulenko</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Darmstadt University of Applied Sciences</orgName>
								<address>
									<addrLine>Schoefferstrasse 3</addrLine>
									<postCode>64295</postCode>
									<settlement>Darmstadt</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Institute of Problems of Artificial Intelligence</orgName>
								<address>
									<addrLine>Prospekt Akademika Ghlushkova 40</addrLine>
									<postCode>03187</postCode>
									<settlement>Kyjiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Vadym</forename><surname>Slyusar</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Institute of Problems of Artificial Intelligence</orgName>
								<address>
									<addrLine>Prospekt Akademika Ghlushkova 40</addrLine>
									<postCode>03187</postCode>
									<settlement>Kyjiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Automatic smart subword segmentation for the reverse Ukrainian physical dictionary task</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">50CA824B1836FD1070B55BA1175E248D</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:25+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>reverse dictionary</term>
					<term>subword segmentation</term>
					<term>terminology science</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This article introduces a novel method for tackling the reverse dictionary task, utilizing text segmentation into subwords. We focus on physical texts written in Ukrainian, dividing words into subwords that include morphemes, individual characters, and their combinations. Unlike word-level segmentation, the subword vocabulary is limited, thereby eliminating the issue of unknown lexical units. Unlike character-level segmentation, each subword retains a certain degree of semantic information, which allows for the construction of meaningful embeddings. We explore various combinations of language models using different levels of segmentation in the context of reverse dictionary development. This approach represents a significant advancement towards automating terminological work through the utilization of machine learning methods applied to terminology science. The findings enhance the linguistic capabilities of artificial intelligence, helping it to process terminology research with a human-like comprehension. Furthermore, the consideration of the Mixture of Experts (MoE) architecture is proposed to integrate both traditional word-based and innovative subword-based approaches. This hybrid method aims to leverage the strengths of both segmentation levels, thereby enhancing the performance of multimodal large language models (LLMs) in processing and understanding intricate linguistic structures.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>One significant aspect of natural language processing (NLP) tasks involves the generation or prediction of text or words. Reverse dictionaries, as outlined by <ref type="bibr" target="#b8">Hill et al. (2016)</ref> and <ref type="bibr" target="#b27">Yan et al. (2020)</ref>, hold promise in this domain, where machine-generated lexical units are proposed based on their definitions.</p><p>Within this framework, employing subwords as fundamental linguistic units offers notable advantages over conventional methods. Compared to approaches using complete words as the smallest units, utilizing subwords circumvents issues associated with unseen words, allowing for the construction of new words using an existing subword vocabulary. Unlike character-based approaches, subword employment maintains a connection to underlying semantics <ref type="bibr">(</ref> MoDaST-2024: 6th International Workshop on Modern Data Science Technologies, May, 31 -June, 1, 2024, Lviv-Shatsk, Ukraine * Corresponding author. † These authors contributed equally. maxvakul@gmail.com (M. Vakulenko); swadim@ukr.net (V. Slyusar) 0000-0003-0772-7950 (M. Vakulenko); 0000-0002-2912-3149 (V. <ref type="bibr">Slyusar)</ref> It is important to highlight that the prevalent byte-pair-encoding method for word segmentation, grounded in mathematical statistics, exhibits several drawbacks <ref type="bibr" target="#b0">(Aguilar et al., 2021)</ref>. Among these, the most unpleasant is its tendency to erroneously segment compound words like "electroneutral" as "electron-eu-tral" or so instead of "electro-neutral" <ref type="bibr" target="#b4">(Church, 2020)</ref>.</p><p>At the same time, one of the major difficulties in the terminology work is conditioned by the need to process huge amounts of terminological data (L' <ref type="bibr">Homme, 2013)</ref> which motivates their automated processing. In particular, an important part of terminology management is the prescriptive step where, according to ISO 704 (2000:vi), the prescribed (recommended) term should be chosen or created on the basis of its definition (see <ref type="bibr">Drewer and Ziegler, 2011, 164)</ref>. In this sense, the process of attributing designations to concepts in terminology science corresponds to reverse dictionary task in NLP.</p><p>Such formulation of terminological (and, more generally, linguistic) tasks in terms of machine algorithms contributes to linguistic competency of an artificial personality with artificial intelligence (see <ref type="bibr">Shevchenko et al., 2023, 27-29)</ref> that manifests the person's ability for humanlike thinking, effective lingual communication, and the so-called "accurate report". The last is considered, in turn, a significant sign of consciousness in mammals <ref type="bibr" target="#b16">(Seth et al., 2004)</ref>.</p><p>Little work of this kind has been done heretofore on the data coming from low-resource languages such as Ukrainian. This paper aims to address this gap by employing symptomatic statistical and analytical methods from the field of terminology science. Specifically, we will present two subword vocabularies tailored to the Ukrainian language within the domain of physics based on the "Explanatory dictionary on physics" <ref type="bibr" target="#b20">(Vakulenko and Vakulenko, 2008)</ref>. The two obtained texts will contain the simple and composed segmentation into the combined and individual subwords, respectively, that is the first step towards a reverse dictionary and other NLP tasks. We will discuss also the most efficient ways to create a reverse dictionary in the field of physics and adjacent fields by means of deep learning. From a more general perspective, this paper makes a step towards linguistic competency of an artificial personality with artificial intelligence (AI) that will be able to create new terms using human-like algorithms. This way, the typical assignments of terminology science that usually require much human work, will be translated to machines with elements of a linguistically competent AI.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Method and material</head><p>In this study, we undertake a supervised learning task focused on creating reverse and domainspecific dictionaries, necessitating the compilation of a linguistic unit vocabulary during the preprocessing phase. As highlighted earlier, subword segmentation emerges as the most viable method for preserving semantics, in contrast to character-level analysis, and for circumventing the challenge of unknown words, as opposed to word-level scenarios.</p><p>This segmentation of Ukrainian texts relies on the set of Ukrainian morphemes (affixes) sourced from specialized dictionaries <ref type="bibr" target="#b18">(Sikorsjka, 1995;</ref><ref type="bibr" target="#b10">Karpilovsjka et al., 1998;</ref><ref type="bibr" target="#b14">Poljugha, 2001)</ref>. A curated collection comprising 2,000 Ukrainian roots, encompassing both commonly used and domain-specific units, has been manually introduced.</p><p>Our initial approach involves the utilization of individual subwords. It is important to note that subwords exhibit significant homonymy, wherein the same combination of letters may occur in different parts of distinct words with varying meanings. We anticipate that incorporating individualized subwording into the neural network will yield averaged sense embeddings, similar to those at the word level (cf. <ref type="bibr">Loureiro et al., 2021, p. 388)</ref>. Additionally, as an analogue to contextual embedding models for words, we will elaborate on a vocabulary of combined subwords, wherein each sense corresponds to a combination of elementary subwords, if applicable. We hypothesize that this second approach will yield a more specific neural network output. A comparative analysis of the results obtained from the aforementioned approaches can provide insights into the extent to which neural network predictions rely on the preliminary preparation of input data.</p><p>The definitions and explanations of terms are drawn from the "Explanatory dictionary on physics" <ref type="bibr" target="#b20">(Vakulenko and Vakulenko, 2008)</ref> which, after the removal of in-text cross-references, comprises 6,068 distinct entries. The resulting subword vocabularies contain approximately 28,000 units each. The free Microsoft transliteration tool has been utilized to facilitate automatic text segmentation based on rules embodying both approaches.</p><p>To range the predicted terms according to their applicability, we suggest using the apt term criteria formulated in a machine-friendly manner (see Vakulenko, 2024):</p><p>1. Exactness (the concordance between the term meaning and its morphological structure) is understood as the cosine similarity (degree of entailment) between the definition and corresponding vocabulary entry.</p><p>2. Essentiality (coverage of key aspects of the concept and absence of false associations) is determined as the ratio between the largest entailment degree and the second-largest degree, as taken from the dictionary explanations.</p><p>3. Plainness (a clear inner form of a term) is calculated as the ratio of the number of subwords in the term coinciding with the sub-words in its definition, to the total number of subwords.</p><p>4. Derivativity (the ability to easily create derivatives of the word) is estimated as the absence of "nnja" and "ttja" in the word ending and the ability to add subwords to the existing word stem. The transliteration is carried out according to the National transliteration standard (DSTU, 2022; see also <ref type="bibr">Vakulenko, 2023b)</ref>.</p><p>5. Good sound (the agreement with phonotactic rules) is regarded as the absence of clusters of more than two different consonants (except "str", "zdr", "spr", "zbr", "skr", "skl", "stv", "zdv", "ntr", "ndr", "ntv", "ndv"); the absence of "ngh" following with a consonant or in the word end; absence of "shr" and "zhr"; the absence of two different neighboring vowels (except the second "u"); absence of "ry", "ghy"; the absence of "bv", "bf", "pv", "pf","mf", "mv", "lr", "ljr", "ljs", "ljsh"; the absence of final consonant clusters (except "sk", "lk", "nt", "st", "stj").</p><p>6. Systemic feature, or systemness (reflection in the designation belonging to a particular class of concepts) is assessed as the availability of the same form among other dictionary entries resulting in meronyms or hypernyms (hyponyms).</p><p>7. Organic nature, or organicity (conformance with spelling and language tendencies) is evaluated as the inverse number of maximum-length subwords.</p><p>8. Compatibility (the ability to be combined in terminological combinations) is estimated as the valence of the term or its closest analogs, if newly coined. 9. Unambiguity is estimated as an inverse total number of definitions in the dictionary corresponding to the term entry. <ref type="bibr" target="#b9">10</ref>. Nominativity (as opposed to descriptive attribute) is calculated according to the formula Knom = 1/(1+nconj+nend), where nconj is an inverse number of conjunctions in the collocation, and nend is the number of verb endings "ty", "tysja", "tysj".</p><p>11. Brevity is estimated as an inverse number of symbols in the term (or an inverse number of sounds).</p><p>This selection of criteria is preferable to those described previously in rules regulating terminological work. In particular, the German standard DIN 2330 <ref type="bibr">(1993,</ref><ref type="bibr" target="#b7">8)</ref> determines the following basic lingual requirements for terms: exactness (Ger. Genauigkeit), brevity (Ger. Knappheit), orientation towards accepted language usage (Ger. Orientierung am anerkannten Sprachgebrauch), motivation (Ger. Motiviertheit), derivability (Ger. Ableitbarkeit), absence of connotations (Ger. Konnotationsfreiheit), speakability (Ger. Sprechbarkeit), linguistic correctness / logic (Ger. sprachliche Korrektheit / Logik), clarity (Ger. Eindeutigkeit) (see <ref type="bibr">Drewer and Ziegler, 2011, 173-175)</ref>. For example, exactness is understood here as a complex requirement combining one-to-one correspondence between a notion and a corresponding name with motivation clarity of a term. Such complex benchmarks should be split into simple ones that has been carried out in our apt term criteria.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Results</head><p>The pieces of codes generating the vocabulary of simple and combined Ukrainian subwords (Phys-Ukr) have been presented in <ref type="bibr" target="#b24">(Vakulenko, 2024</ref>).</p><p>Here The full text of "Explanatory dictionary on physics" subworded into simple (individual) and combined (composite) subwords, is available on GitHub: https://github.com/Mova-2020/Subworded-Explanatory-Dictionary-on-Physics-/tree/main.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Discussion</head><p>The same character combinations may necessitate different segmentation in various words, a phenomenon that can be observed within a terminology science framework utilizing a symptomatic statistical method <ref type="bibr">(Vakulenko, 2014, 19-23;</ref><ref type="bibr">Vakulenko, 2023a, 123-132)</ref>. Unlike mathematical statistics, which deals with strict quantities, symptomatic statistics focuses more on qualitative occurrences and tendencies. Consequently, segmentation based on symptomatic statistics may differ from that favored by mathematical statistics, which tends to prioritize subword division according to the most "frequent" character combinations, disregarding alternative variants. However, accounting for different combinations of subwords leads to various patterns with differing probabilities.</p><p>For example, the letter combination "abcd" may be split into "ab&amp;cd" with a 50% probability, "a&amp;bcd" with a 30% probability, and "abc&amp;d" with a 20% probability. Initially, the first variant may seem preferable, but this preference can change significantly with the addition of another letter. For instance, the split "ab&amp;cde" may have a 10% probability, leaving 90% for "abc&amp;de".</p><p>The subword vocabulary derived from the "Explanatory dictionary on physics" <ref type="bibr" target="#b20">(Vakulenko and Vakulenko, 2008</ref>) contains numerous such units. For instance, the formant "vys" may appear in words like "vysylaty" ('emit') where the first two letters belong to the prefix and the third is the initial letter of the root, as well as in "vysity" ('hang'), where this formant represents the root. To differentiate the formant "vysl" appearing in the words "provyslyj" ('sagging') and "vyslanyj" ('emitted'), we introduce the additional subword combination &amp;vy&amp;sl&amp;a&amp;n&amp; working for the last word. Similarly, to distinguish the homonymic formants "dal" as in "dala" ('gave') and in "dalekyj" ('far'), we use the subwordings &amp;da&amp;l&amp;a&amp; and &amp;dal&amp;ek&amp;, respectively. The formant "ynni" may belong to the adjective "polovynni" ('half') containing the suffixes "yn" and "n", and to the noun "rjabotynni" ('ripples') with the differing suffixes "y" and "nn". In this case, the most detailed segmentation is provided, which enables all possible variants: &amp;y&amp;n&amp;n&amp;i&amp;.</p><p>Moreover, the frequencies of such divisions may vary significantly depending on the domain.</p><p>Given that many terms are internationalisms, the neural network is expected to predict terms composed of international elements. To accommodate this, subwords corresponding to international roots and affixes are introduced. For example, the stem "vizualjn" ('visual') is segmented into &amp;viz&amp;u&amp;alj&amp;n&amp;.</p><p>This application of the symptomatic statistical method mirrors human-generated knowledge, which is pertinent to the reverse dictionary task. On the other hand, the predictions of the neural network align with the analytical method, imbuing the methods of terminology science with a machine learning interpretation, which represents a significant step toward intelligent execution of various terminological tasks. This supervised training enables the machine to emulate human thinking processes.</p><p>The text of the "Explanatory dictionary on physics" <ref type="bibr" target="#b20">(Vakulenko and Vakulenko, 2008</ref>) subworded based on the described vocabularies, contains on average 4-5 subwords per word and is devoid of errors such as "*electron-eutral". Terms stemming from indigenous Ukrainian roots exhibit more similarity with their explanations compared to international terms.</p><p>The practical implementation of the proposed approach consists, first of all, in training of embeddings for Ukrainian subwords (composed and simple) using transformers and other architectures.</p><p>At the same time, taking into account the significant prior work in creating vector databases within the framework of the traditional approach using word dictionaries, it is advisable to consider the combination of the proposed approach to segmentation based on morphemes with known methods of tokenization and vector embedding of whole words. This can significantly improve the performance of NLP models, including those designed for reverse dictionary creation tasks.</p><p>The concept of effective integration of traditional and proposed methods may consist of the use of various technologies covering the key stages of textual data processing.</p><p>First of all, we are talking about hybrid tokenization with the segmentation of texts simultaneously at the level of morphemes and words. This dual approach allows the language model to track both the semantic nuances provided by morphemes and the contextual information encapsulated in full words. In some cases, especially for processing unknown words or for lexical units with less clear morphological boundaries, character-level segmentation should also be included as an additional level of subword analysis.</p><p>The next object of modification is the stage of vector embeddings, where changes can be made in three important directions:</p><p> embedding based on morphemes, which will allow displaying the semantic and syntactic properties of vector embeddings for the entire variety of morphemes. This can be achieved by training on a large corpus of morphologically annotated texts or by adapting existing word embeddings to morphemes using subword information;  word embedding with morphological awareness, which consists of combining the process of morpheme embedding with the formation of word embeddings, ensuring that the resulting word vectors will reflect the contribution of individual morphemes. Appropriate unification can be done using weighted averaging or based on special neural architectures trained to compose embedding morphemes into word embeddings;  contextual embedding using language models such as bidirectional encoder representations from transformers (BERT) or its derivatives capable of generating context-sensitive embeddings. These models can be fine-tuned on morpheme-segmented text to produce embeddings sensitive to the morphological structure of words in a given context. Tuning the architecture of the language model covers two main aspects: (i) the inclusion of morphological information in the input layer of the language model and (ii) the corresponding adaptation of the attention mechanism. For example, the large language model (LLM) input layer should be designed to accept morpheme representations alongside traditional word tokens. This can be implemented using parallel channels of input of relevant data or on the basis of a unified representation that combines information at the level of morphemes and words, for example, as part of a concatenation operation. Changes in the attention mechanism are driven by the need to allow the model to focus on relevant morphemes or word segments when predicting or generating representations. This is especially useful for tasks that depend on understanding subtle semantic differences.</p><p>The learning strategy of LLM-modified architecture is based on joint learning on morphological and semantic tasks. Training should consist of a combination of tasks that require both morphological understanding (e.g., segmentation of morphemes, marking parts of speech) and semantic tasks (e.g., recognition of word meanings, reverse dictionary entry). This prompts the model to develop its representations that are informative at both levels.</p><p>Transfer learning and fine-tuning procedures can be used to simplify the learning process with the involvement of a pre-trained embedding and a language model as a starting point, with their further refinement on the corpus of text annotated with morphological information. This approach can significantly reduce the training time and improve the performance of the language model, relying on the existing linguistic knowledge.</p><p>Specific evaluation metrics that take into account both morphological accuracy and semantic relevance can be used to evaluate training effectiveness, ensuring that the integrated approach effectively supports the NLP target tasks.</p><p>It should be noted that integrating morpheme-based segmentation with traditional tokenization and embedding methods will initially require iterative refinement based on feedback and task-specific requirements. However, thanks to the well-thought-out integration of morpheme-based segmentation with traditional NLP methods, one can hope for the creation of Ukrainian-language models that will take linguistic nuances into account and be reliably contextual. This will lead to improved LLM performance in a wide range of language understanding tasks, including but not limited to reverse dictionary creation.</p><p>Looking at the positive aspects of combining word vectors with subword or morpheme vectors in a wider range of aspects, it is important to emphasize that this can significantly improve the ability of NLP models to understand and process language. At the same time, the beneficial effect of grouping words with similar meanings into common clusters will be preserved and strengthened, which will affect the process of finding synonyms and working with language structures in several ways. In particular, semantic accuracy will improve because integrating morpheme or subword vectors with whole word vectors can help models better understand the semantic relationships between words, especially since many words share morphemes that indicate relatedness or semantic proximity. For example, words with the same prefixes or suffixes often have similar meanings or belong to the semantic category. This can make the process of finding semantic cognates more accurate and efficient.</p><p>In addition, the use of morpheme vectors allows us to enrich the vector space by providing additional dimensions to distinguish between words that may appear similar in meaning but have differences in usage or connotation. This will allow the LLM to better navigate the nuances of language and distinguish between words with subtle differences in meaning.</p><p>Integrating morpheme vectors with whole word vectors can make the search process of synonyms, antonyms, heteronyms, and other semantically related lexical units more flexible. Through morpheme analysis, language models can identify such units not only based on complete similarity of word forms but also based on commonality of morpheme components, which can reveal a wider range of semantic relationships.</p><p>Another positive effect is the improvement of the processing of newly created words. Models that use both whole word and morpheme vectors do better with newly created or rarely used words because they can interpret their meaning based on known morphemes. This enhances the model's ability to find semantically related units and understand language even when LLMs encounter unfamiliar terms.</p><p>Thus, the integration of morpheme vectors with word vectors not only preserves the beneficial effect of grouping similar words into common cluster groups but also greatly expands the potential of NLP models for understanding and processing linguistic data. This allows us to better perceive semantic relations, enrich the vector space, and increase accuracy and flexibility when finding synonyms of words. This approach makes it possible to create deeper and more extensive language models, capable of understanding not only the surface content of the text but also the deep structure and meaning of individual language units.</p><p>The option of combining two different types of vector data at the LLM input is not the only possible solution. Another approach is to use two different LLMs independently, one focused exclusively on processing traditional word vectors and the other on embedding only subword vectors. The idea is to further combine these different architectures into one through a special merge operation. This approach using different LLMs to process traditional word vectors and embeddings of subword vectors is a new strategy for building complex NLP systems. This approach allows one to use specialized models for different aspects of language analysis and then combine their strengths to achieve better performance on specific tasks. Let's consider its main stages in more detail.</p><p>Step 1. Preparation of two variants of language models. A model for traditional word vectors is trained or fine-tuned for NLP tasks using standard word vector bases. It can be, for example, a BERT, a generative pre-trained transformer (GPT), a Mistral, or any other model optimized for working with full-format words and their context.</p><p>The subword vector embedding model specializes in parsing and using subword vectors, such as morphemes or character grams. This model can be adapted for a deeper understanding of the morphological structure of language and used for tasks that require more detailed linguistic analysis. Each model is trained independently to process input data in its specialized domain to solve the tasks of classification, information summarization, semantic analysis, etc. The output of these LLM variants is vector representations or other forms of output specific to a particular task.</p><p>Step 2. Fusion of model outputs After obtaining the results from both types of models, these results are combined using several different methods.</p><p>The simplest way to combine is to concatenate the outputs of both models into one longer vector before further processing or classification. For a more refined combination, an attention mechanism can be applied, which determines the importance of each element of the output of both models for a specific task. It is also possible to develop and train an additional layer or neural network that specializes in merging the outputs from the two models, optimizing the merging process for specific tasks.</p><p>The effectiveness of this approach depends on the ability of the fusion procedure to qualitatively integrate information from both sources. The approach of combining the conclusions from different models makes it possible to use each model taking into account its maximum advantages, providing flexibility and the possibility of deeper data analysis. At the same time, models of different sizes and different numbers of layers can be used. However, such text processing also requires careful planning and tuning of the fusion process and can increase computational costs due to the need to manage multiple models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>The main methods to merge language models in the Mergekit framework <ref type="bibr">(Goddard,</ref>  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>✅ ✅</head><p>A more advanced option for merging different LLMs, which has been intensively developing recently, is to combine their architectures using a special Mergekit framework <ref type="bibr" target="#b7">(Goddard, 2024)</ref>. Its feature is the possibility to obtain the resulting model of the same size and the same number of parameters as in the models that were subjected to the merging procedure. The list of the main methods of this type is presented in Table <ref type="table">1</ref>.</p><p>Fig. <ref type="figure" target="#fig_1">1</ref> is an illustration of the process of combining 4 pre-trained language models into one using the DARE TIES combined method. In this way, for example, a language model of a physicaltechnical orientation can be implemented, if not only the physical dictionary of subwords considered above, but also a technical dictionary formed similarly would be used to train the combined models. One of the first works promoting this type of architecture is the monograph by Zhi-Hua Zhou (2012). In the corresponding structure of the expert system (Fig. <ref type="figure">2</ref>), it was assumed to control the weight vectors of the output results of several experts with the help of a special control gateway.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 2:</head><p>The classic structure of a mixture of experts <ref type="bibr">(Zhou, 2012, 94)</ref> This approach is very close to the operation of the multiple LLM merging procedure described above. In this way, the outputs of LLMs with input word embeddings and individual LLMs with subword embeddings must be combined by weight processing controlled by a special gateway.</p><p>The modern concept of MoE is an advanced approach in machine learning, which allows to create highly adaptive models by combining the conclusions from a set of "expert" subnetworks. This approach was developed in the context of LLMs such as Mixtral to improve the efficiency and adaptability of models to different tasks or data domains.</p><p>The main idea behind MoE is to distribute input data between different "expert" models based on their specialization. Each expert is optimized to handle a specific type of information or task. After processing the input data by several selected experts, the results of their work are combined using a switch that determines the weight of each expert for the final output of the model. At the same time, the rest of the experts are not involved, as shown in Fig. <ref type="figure" target="#fig_2">3</ref>  <ref type="bibr" target="#b3">(Chen et al., 2022)</ref>, that saves computing costs and allows reducing the requirements for available hardware resources.</p><p>In the context under consideration, each of the MoE experts is proposed to be replaced by a pair of LLMs, one of which works with traditional word embedding, and the other with a vector base of subwords in the appropriate task modality. The gating mechanism is also implemented on the basis of a separate language model, which decides how to distribute input data between the experts available in the structure and how to combine their conclusions. Routers can be trained to determine which expert is best to handle a given incoming request. This principle of operation allows for dynamic load distribution, adaptively changing the flow of input data between experts depending on the task or the involved context.</p><p>Thus, the MoE concept makes it possible to create models that can adapt to a variety of data types and tasks using specialized expert clusters. Adding new experts to handle additional data types or tasks is relatively straightforward, allowing for easy scaling of the model. Due to the ability to distribute the computational load among experts, MoE can be more efficient than traditional approaches, especially under resource-constrained conditions.</p><p>In the context of LLMs such as Mixtral, the MoE has been used to build models capable to efficiently handle a wide range of linguistic data and tasks, from text classification to speech generation. The MoE application option proposed by the authors makes it possible to use different expert models to process, for example, traditional word vectors and subword vectors, and then integrate their outputs to obtain a comprehensive understanding of the text. This approach opens new opportunities for the development of language models, allowing to creation more powerful, flexible, and adaptive natural language processing systems.</p><p>Using separate experts for processing words and separate experts for processing subword vectors in the context of MoE opens up opportunities to improve the flexibility and efficiency of language models and opens a possibility to involve different levels of linguistic analysis, combining a deep understanding of the morphological structure of language with contextual analysis at the level of whole words or phrases. At the same time, expert models having various architectures, a wide range of sizes, and quantization levels can be used. This will make it possible to compensate for the increase in the volume of dictionaries of subwords compared to the traditional structures of vector bases of whole words, choosing architecture variants with a higher level of quantization of weight coefficients for the construction of expert models with morpheme embedding.</p><p>As an illustration, Fig. <ref type="figure" target="#fig_3">4</ref> shows the relation between the memory requirements and the number of tokens for different quantization levels (Q8, Q6 and Q5) obtained by the authors from the results of the inference procedure for LLM Dolfin 2.6 without GPU. Corresponding scores were calculated using the LM Studio framework. As expected, the memory requirements increase with the maximum number of tokens processed (horizontal axis) at all quantization levels. Significantly, such a dependence is linear, which has not been obvious. Also, higher quantization levels (Q8) require more memory than lower ones (Q6 and Q5), indicating that quantization effectively reduces memory requirements. The numerical values of the data given in Fig. <ref type="figure" target="#fig_3">4</ref> are presented in Table <ref type="table" target="#tab_2">2</ref>. Thus, LLM quantization within MoE is a key method to minimize computing resources for MoE LLM operation.</p><p>The division of tasks between experts in the MoE enables each of them to specialize in a specific aspect of language analysis. For example, whole-word experts may focus on semantic and contextual relationships, while subword experts may focus on morphological parsing and linguistic unit analysis at a finer level.</p><p>Combining input from experts specializing in different levels of linguistic analysis can lead to a deeper and more comprehensive understanding of a text. This is especially important for complex language tasks such as understanding allusions, idioms, or ambiguities. Overall, the MoE approach makes it easy to adapt the model to a variety of tasks or domains, dynamically changing the input of different experts depending on the context or specificity of the data. However, despite these advantages, training and integrating multiple specialized experts can add additional complexity to the model development and optimization process. In addition, effectively combining the findings from different experts requires careful selection and tuning of the switching mechanism to ensure an optimal distribution of weights among the experts. In doing so, it is important to ensure that no single expert dominates the decision-making process, as this may lead to insufficient consideration of input from other experts.</p><p>When scaling the considered approach to multimodal tasks, it is advisable to match image, video, or audio vectors to the embedding vectors of not only whole words but also different variants of subwords.</p><p>Similarly, in addition to the vectorization of entire images or videos, it is suggested to use a vector base of image fragments or parts of video frames. In particular, a separate augmentation of the vectorized base of video recordings by vectorizing the joints of adjacent frames in video streams can be useful, which will allow a better perception of the dynamics of interframe changes in video scenes. It is quite obvious that additional embedding of fragments or parts of video frames opens up new opportunities for deeper analysis of visual content. This is especially important for multimodal applications where visual and textual data must be matched, including embedding vectors not only for whole words but also different variants of subwords. The fact is that by analyzing individual fragments of images or parts of video frames, we can reveal details that may remain unnoticed when analyzing a complete image or video. This will provide a better understanding of the rendering scene, elements in the background, as well as smaller objects or actions that occur in the frame. Vectorization of the joints between adjacent frames allows us to more holistically and predictably perceive the dynamics of scenes, changes in the location of objects, facial expressions, or movements, providing information about the movement and interactions of all components of video content. This significantly improves the model's ability to understand video, including its verbal description.</p><p>The positive effect of multimodal interaction in the proposed way is to strengthen the correspondence between visual and textual data. In multimodal applications, it is important to establish an exact correspondence between visual elements (images, videos) and textual data (words, phrases). Vectorization of both visual and textual content at a finer level gives the model the ability to better understand the relationships between different modalities. In addition, the augmentation of the vector base due to the compatible vectorization of frame joints and subwords enriches the information space on which the model is trained, allowing it to better adapt to various tasks and contexts. This may include improving the ability to determine context, understanding intentions and emotions, and providing additional degrees of freedom for generalizations.</p><p>Although vectorizing image, audio, or video fragments increases the amount of data to process, using efficient algorithms and architectures optimized for performance can help manage this increase. At the same time, it is necessary to ensure effective coordination between different modalities, using such approaches as alignment or joint representation algorithms to integrate and synchronize vector spaces of visual and textual data. In general, the development and training of models that effectively use the extended vector base will require the use of advanced methods of deep learning and the adaptation of existing architectures to new requirements. In particular, the use of a set of small language models as part of MoE <ref type="bibr" target="#b19">(Slyusar et al., 2024)</ref>, which specializes in certain areas of combinations of subword embeddings with niche modalities, bypassing the involvement of more universal models of large sizes, deserves attention.</p><p>The use of these approaches opens up new perspectives for creating more powerful and adaptive multimodal systems that can effectively handle the complex tasks of analyzing, understanding, and generating diverse content.</p><p>Fine-tuning embeddings trained in other languages is a viable elaboration. It holds promise to benchmark the proposed method against the byte-pair-encoding technique and establish a gold standard for cosine similarity between dictionary definitions and predicted terms. Utilizing predicted terminology can augment machine translation systems, elevating translation quality. This methodology can extend to other Slavic and world languages. The created subword vocabularies can expand beyond physics to encompass various domains, including general dictionaries. Ultimately, we anticipate the development of a neural network adept at autonomously suggesting terms for emerging concepts, representing an advanced AI technology capable of performing terminological tasks. However, these pursuits necessitate dedicated investigation and computation beyond the scope of this study.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>So, in this paper, we have introduced a novel method for subword segmentation essential for the pre-processing phase of reverse dictionary tasks and other natural language processing (NLP) challenges, thereby embodying the principles of terminology science within a machine learning framework. We also have established criteria for term suitability in a format compatible with machine processing, and discussed possible ways to carry out machine learning to obtain on this basis a reverse dictionary.</p><p>The resulting subworded text mitigates errors commonly encountered in widely used bytepair-encoding algorithms, which rely solely on mathematical statistics. By employing symptomatic statistical and analytical techniques from terminology science within machine learning, we take a significant step towards executing various terminological tasks intelligently, effectively imparting human-like thinking to AI systems. Furthermore, the neural network trained to autonomously generate terms for novel concepts holds the potential to evolve into advanced AI technology capable of handling all terminological work.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>Chaudhary et al., 2018; Zhang et al., 2020; Aguilar et al., 2021). Consequently, the decomposition of words into constituents has been investigated in various NLP tasks focusing on text generation, prediction, and speech recognition (Chaudhary et al., 2018; Sennrich et al., 2016; Arčan et al., 2019; Church, 2020).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Merging of a few trained LLMs</figDesc><graphic coords="9,86.20,72.00,425.30,168.35" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Switched mixture of experts (Chen et al., 2022)</figDesc><graphic coords="10,117.48,211.67,378.05,229.34" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Memory requirements vs. number of tokens for different quantization levels</figDesc><graphic coords="11,92.42,296.78,412.95,249.58" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>2024)    </figDesc><table><row><cell>Method</cell><cell></cell><cell></cell><cell>Multi-Model</cell><cell>Uses base model</cell></row><row><cell>Linear</cell><cell>(Model</cell><cell>Soups)</cell><cell>✅</cell><cell>❌</cell></row><row><cell cols="3">(Wortsman et al., 2022)</cell><cell></cell><cell></cell></row><row><cell cols="3">SLERP (Spherical Linear</cell><cell>❌</cell><cell>✅</cell></row><row><cell cols="2">IntERPolation)</cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="3">Task Arithmetic (Ilharco et</cell><cell>✅</cell><cell>✅</cell></row><row><cell>al., 2023)</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="3">TIES (TrIm, Elect Sign &amp;</cell><cell>✅</cell><cell>✅</cell></row><row><cell cols="3">Merge) (Yadav et al., 2023)</cell><cell></cell><cell></cell></row><row><cell cols="3">DARE (Drop And REscale)</cell><cell></cell><cell></cell></row><row><cell cols="2">(Yu et al., 2023)</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>Memory requirements vs. number of tokens for different quantization levels</figDesc><table><row><cell>Tokens</cell><cell></cell><cell>Memory</cell><cell></cell></row><row><cell></cell><cell></cell><cell>Requirements (GB)</cell><cell></cell></row><row><cell></cell><cell>8 quants (Q8)</cell><cell>6 quants (Q6)</cell><cell>5 quants (Q5)</cell></row><row><cell>1024</cell><cell>7,60</cell><cell>5,92</cell><cell>5,15</cell></row><row><cell>2048</cell><cell>7,74</cell><cell>6,05</cell><cell>5,28</cell></row><row><cell>4096)</cell><cell>7,99</cell><cell>6,32</cell><cell>5,54</cell></row><row><cell>8192</cell><cell>8,52</cell><cell>6,84</cell><cell>6,07</cell></row><row><cell>16384</cell><cell>9,57</cell><cell>7,89</cell><cell>7,12</cell></row><row><cell>32768</cell><cell>11,67</cell><cell>9,99</cell><cell>9,22</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Char2subword: Extending the Subword Embedding Space Using Robust Character Compositionality</title>
		<author>
			<persName><forename type="first">G</forename><surname>Aguilar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mccann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Niu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Rajani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Keskar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Solorio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP</title>
				<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="1640" to="1651" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Translating Terminological Expressions in Knowledge Bases with Neural Machine Translation</title>
		<author>
			<persName><forename type="first">M</forename><surname>Arčan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Torregrosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Buitelaar</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1709.02184</idno>
		<imprint>
			<date type="published" when="2019-07-31">Jul 31, 2019</date>
		</imprint>
	</monogr>
	<note>cs.CL</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations</title>
		<author>
			<persName><forename type="first">A</forename><surname>Chaudhary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Levin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Neubig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mortensen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carbonell</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D18-1366</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">E</forename><surname>Riloff</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Chiang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Hockenmaier</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Tsujii</surname></persName>
		</editor>
		<meeting>the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics<address><addrLine>Brussels, Belgium</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="3285" to="3295" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Towards Understanding Mixture of Experts in Deep Learning</title>
		<author>
			<persName><forename type="first">Zixiang</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yihe</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yue</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Quanquan</forename><surname>Gu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yuanzhi</forename><surname>Li</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2208.02813</idno>
		<imprint>
			<date type="published" when="2022-08-04">04 August 2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Emerging Trends: Subwords, Seriously?</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">W</forename><surname>Church</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Natural Language Engineering</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="page" from="375" to="382" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">Petra</forename><surname>Drewer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wolfgang</forename><surname>Ziegler</surname></persName>
		</author>
		<title level="m">Technische Dokumentation</title>
				<meeting><address><addrLine>Wuerzburg</addrLine></address></meeting>
		<imprint>
			<publisher>Vogel Buchverlag</publisher>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<idno>DSTU 9112:2021</idno>
		<title level="m">ISO 9:1995, NEQ), Kyrylychno-latynychna transliteracija i latynychnokyrylychna retransliteracija ukrajinsjkykh tekstiv. Pravyla napysannja (Cyrillic-Latin transliteration and Latin-Cyrillic retransliteration of Ukrainian texts. Writing rules)</title>
				<meeting><address><addrLine>UkrNDNC, Kyjiv</addrLine></address></meeting>
		<imprint>
			<publisher>DP</publisher>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note>in Ukrainian</note>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Charles</forename><surname>Goddard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mergekit</forename></persName>
		</author>
		<ptr target="https://github.com/arcee-ai/mergekit" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Learning to Understand Phrases by Embedding the Dictionary</title>
		<author>
			<persName><forename type="first">F</forename><surname>Hill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Korhonen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<idno type="DOI">10.1162/tacl_a_00080</idno>
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="17" to="30" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Editing Models with Task Arithmetic</title>
		<author>
			<persName><forename type="first">Gabriel</forename><surname>Ilharco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marco</forename><surname>Tulio Riabeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mitchell</forename><surname>Wortsman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Suchin</forename><surname>Gururangan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ludwig</forename><surname>Schmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hannaneh</forename><surname>Hajishirzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ali</forename><surname>Farhadi</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2212.04089</idno>
		<imprint>
			<date type="published" when="2023-03-31">31 Mar 2023</date>
		</imprint>
	</monogr>
	<note>cs.CL</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Slovnyk afiksaljnykh morfem ukrajinsjkoji movy (Dictionary of affixal morphemes of the Ukrainian language)</title>
		<author>
			<persName><surname>Je</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Karpilovsjka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Karpilovsjkyj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Klymenko</surname></persName>
		</author>
		<author>
			<persName><surname>Nedozym</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">t movoznavstva im</title>
				<editor>
			<persName><forename type="first">O</forename><forename type="middle">O</forename><surname>Potebni</surname></persName>
		</editor>
		<imprint>
			<publisher>Kyjiv</publisher>
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
	<note>in Ukrainian</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Large Terminological Databases</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Homme</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Supplementary Volume Dictionaries. An International Encyclopedia of Lexicography: Supplementary Volume: Recent Developments with Focus on Electronic and Computational Lexicography</title>
				<editor>
			<persName><forename type="first">R</forename><surname>Gouws</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><surname>Heid</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">W</forename><surname>Schweickard</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Wiegand</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin, Boston</addrLine></address></meeting>
		<imprint>
			<publisher>De Gruyter Mouton</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="1480" to="1486" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Analysis and Evaluation of Language Models for Word Sense Disambiguation</title>
		<author>
			<persName><forename type="first">D</forename><surname>Loureiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Rezaee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Pilehvar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Camacho-Collados</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">47</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="387" to="443" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Mixtral of experts: A high quality Sparse Mixture-of-Experts</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">I</forename><surname>Mistral</surname></persName>
		</author>
		<author>
			<persName><surname>Team</surname></persName>
		</author>
		<ptr target="https://mistral.ai/news/mixtral-of-experts/" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Poljugha</surname></persName>
		</author>
		<title level="m">Slovnyk ukrajinsjkykh morfem (Dictionary of Ukrainian morphemes)</title>
				<meeting><address><addrLine>Svit, Ljviv</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
	<note>in Ukrainian</note>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Neural Machine Translation of Rare Words with Subword Units</title>
		<author>
			<persName><forename type="first">R</forename><surname>Sennrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Haddow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Birch</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P16-1162</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Erk</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Smith</surname></persName>
		</editor>
		<meeting>the 54th Annual Meeting of the Association for Computational Linguistics<address><addrLine>Berlin, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1715" to="1725" />
		</imprint>
	</monogr>
	<note>Long Papers, Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Criteria for consciousness in humans and other mammals</title>
		<author>
			<persName><forename type="first">K</forename><surname>Anil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bernard</forename><forename type="middle">J</forename><surname>Seth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><forename type="middle">B</forename><surname>Baars</surname></persName>
		</author>
		<author>
			<persName><surname>Edelman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Consciousness and cognition</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="119" to="139" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Main directions for implementation of the artificial intelligence strategy in Ukraine</title>
		<author>
			<persName><forename type="first">A</forename><surname>Shevchenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kondratenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Slyusar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhukov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kondratenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vakulenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Information processing in control and decision-making systems, Problems and solutions</title>
				<editor>
			<persName><forename type="first">V</forename><surname>Vychuzhanin</surname></persName>
		</editor>
		<meeting><address><addrLine>Odesa, Ukraine</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="7" to="33" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">Z</forename><surname>Sikorsjka</surname></persName>
		</author>
		<title level="m">Ukrajinsjko-rosijsjkyj slovotvorchyj slovnyk: 2-ghe vyd. Slovnyk (Ukrainian-Russian word-making dictionary</title>
				<meeting><address><addrLine>Osvita, Kyjiv</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
	<note>2nd edition. Dictionary. in Ukrainian</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Some Aspects of Artificial Intelligence Development Strategy for Mobile Technologies</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">I</forename><surname>Slyusar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ju</forename><forename type="middle">P</forename><surname>Kondratenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">I</forename><surname>Shevchenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">V</forename><surname>Jeroshenko</surname></persName>
		</author>
		<idno type="DOI">10.13052/jmm1550-4646.2031</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Mobile Multimedia</title>
		<imprint>
			<biblScope unit="volume">2024</biblScope>
			<biblScope unit="page" from="525" to="554" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">O</forename><surname>Vakulenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">V</forename><surname>Vakulenko</surname></persName>
		</author>
		<title level="m">Tlumachnyj slovnyk iz fizyky: [6644 statti</title>
				<imprint>
			<publisher>Kyjiv</publisher>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
	<note>VPC &quot;Kyjivsjkyj universytet. in Ukrainian</note>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Term and terminology: basic approaches, definitions, and investigation methods (Eastern-European perspective)</title>
		<author>
			<persName><forename type="first">M</forename><surname>Vakulenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Terminology Science &amp; Research</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="13" to="28" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">O</forename><surname>Vakulenko</surname></persName>
		</author>
		<title level="m">Suchasna ukrajinsjka terminologhija: metodologhija, kodyfikacija, leksykoghrafichna praktyka (Modern Ukrainian Terminology: Methodology, Codification, and Lexicographic Practice) (Specialty 10.02.01 -Ukrainian Language</title>
				<meeting><address><addrLine>Kyjiv</addrLine></address></meeting>
		<imprint>
			<biblScope unit="page">2023</biblScope>
		</imprint>
		<respStmt>
			<orgName>Kyjiv National University after Taras Shevchenko</orgName>
		</respStmt>
	</monogr>
	<note>Dr. Sc. thesis. in Ukrainian</note>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Normalization of Ukrainian letters, numerals, and measures for natural language processing</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">O</forename><surname>Vakulenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Digital Scholarship in the Humanities</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="1307" to="1321" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Terminology Science in Machine Learning: Smart Subword Segmentation of Ukrainian Physical Texts</title>
		<author>
			<persName><forename type="first">Maksym</forename><surname>Vakulenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Horizons in Computer Science Research</title>
				<editor>
			<persName><forename type="first">Thomas</forename><forename type="middle">S</forename><surname>Clary</surname></persName>
		</editor>
		<meeting><address><addrLine>New York</addrLine></address></meeting>
		<imprint>
			<publisher>Nova Science Publishers, Inc</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="147" to="161" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time</title>
		<author>
			<persName><forename type="first">Mitchell</forename><surname>Wortsman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gabriel</forename><surname>Ilharco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yitzhak</forename><surname>Samir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rebecca</forename><surname>Gadre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Raphael</forename><surname>Roelofs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ari</forename><forename type="middle">S</forename><surname>Gontijo-Lopes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hongseok</forename><surname>Morcos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ali</forename><surname>Namkoong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yair</forename><surname>Farhadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Simon</forename><surname>Carmon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ludwig</forename><surname>Kornblith</surname></persName>
		</author>
		<author>
			<persName><surname>Schmidt</surname></persName>
		</author>
		<idno>ArXiv: 2203.05482 [cs.LG</idno>
		<imprint>
			<date type="published" when="2022-07-01">01 Jul 2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<title level="m" type="main">Resolving Interference When Merging Models</title>
		<author>
			<persName><forename type="first">Prateek</forename><surname>Yadav</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Derek</forename><surname>Tam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Leshem</forename><surname>Choshen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Colin</forename><surname>Raffel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mohit</forename><surname>Bansal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><surname>Merging</surname></persName>
		</author>
		<idno>ArXiv: 2306.01708 [cs.LG</idno>
		<imprint>
			<date type="published" when="2023-10-27">27 Oct 2023</date>
		</imprint>
	</monogr>
	<note>TIES-</note>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">BERT for Monolingual and Cross-Lingual Reverse Dictionary</title>
		<author>
			<persName><forename type="first">H</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Qiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Deng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2020</title>
				<editor>
			<persName><forename type="first">T</forename><surname>Cohn</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2020-11">November 2020</date>
			<biblScope unit="page" from="4329" to="4338" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<title level="m" type="main">Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch</title>
		<author>
			<persName><forename type="first">Le</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bowen</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Haiyang</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fei</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yongbin</forename><surname>Li</surname></persName>
		</author>
		<idno>ArXiv: 2311.03099</idno>
		<imprint>
			<date type="published" when="2023-11-06">06 Nov 2023</date>
		</imprint>
	</monogr>
	<note>cs.CL</note>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">C</forename><surname>Lipton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>Smola</surname></persName>
		</author>
		<ptr target="https://d2l.ai/" />
		<title level="m">Dive into Deep Learning</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<author>
			<persName><forename type="first">Zhi-Hua</forename><surname>Zhou</surname></persName>
		</author>
		<title level="m">Ensemble Methods: Foundations and Algorithms (Chapman &amp; Hall/CRC Machine Learning &amp; Pattern Recognition</title>
				<meeting><address><addrLine>Boca Raton, FL, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Taylor &amp; Francis Group</publisher>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
