Mathematical Model of Semantic Search and Search Optimization Taras Basyuk [0000-0003-0813-0785], AndriiVasyliuk [0000-0002-3666-7232], Vasyl Lytvyn [0000-0002-9676-0180] Information Systems and Networks Department, Lviv Polytechnic National University, Lviv, Ukraine Taras.M.Basyuk@lpnu.ua1, Andrii.S.Vasyliuk@lpnu.ua2, Vasyl.V.Lytvyn@lpnu.ua3 Abstract. This article analyzes the existing technologies of semantic search, which are used by search engines and outlines the main tasks that arise in this case. It is shown that for the description of search engine optimization algorithms, it is expedient to use a unified mathematical apparatus, in which algebra of algorithms is chosen. The result of the study is a synthesized model that allows to evaluate the content of the online resource for the purpose of similarity of texts and describes the process of forming ontology concepts to evaluate the possibilities of semantic information search. Also, it was formulated recommendations that must be followed in the process of search engine optimization using semantic search technology. The conducted research creates preconditions for designing the corresponding software units, their verification and adaptation to functioning in the global network. Keywords: Internet resource; popularization; semantic search; search engines, algebra of algorithms. 1 Introduction Search engines provide users with access to relevant information, highlighting it from a variety of online resources, the number of which continuously grows year by year. According to Netcraft, at the beginning of 2019, the number of websites was about 1.3 bln. The typical search engine's job is to find the query keywords and exclude the content analysis filling [1]. The situation is becoming even more complicated because of the fact that in each language there are such concepts as synonymy (words with identical meanings) or polysemy (words with several meanings), which greatly increases the number of irrelevant results. From this point of view, there arises the task of a detailed analysis of the content of Internet resources in order to minimize such situations. Semantic web technologies provide a variety of tools for solving this problem by simplifying search using standardized ontological languages and using semantic search technologies that apply ontologies to store databases [2]. In view of this, the actual task of the study is to conduct research in the field of search engine optimization using semantic search technologies. At the same time, it is expedient to use a unified mathematical apparatus to display (describe) algorithms for the promotion of Internet resources. As such, it is proposed to use algebra of algorithms [3]. The aforementioned algebra of algorithms provides means for synthesis and minimization of algorithm algebra formula, which in the future allows to execute the synthesis of mathematical support on the basis of the algebra of algorithms operations properties and their transformations. According to the mentioned features, the use of algebra of algorithms is proposed as a means for creating a mathematical support for the process of search optimization using semantic search technologies. 1.1 Analysis of recent researches and publications Nowadays, search engines use a relevant model to evaluate the accordance of the search query to the desired document, which in most cases can not cope with the tasks. This is primarily due to the approach used and the evaluation of the artificial criteria, such as the location of words on the page, their number, etc. [4,5]. The analysis of well-known research has shown [1,6] that the most popular technologies used in the process of finding relevant information are: boolean search - a combination of elements that allows to include and exclude from search results documents that contain certain words with the help of logical operators (there are two or more elements or phrases), (one item or phrase is excluded), (one of the elements must be in the description) [6]; Wildcard characters search - involves the use of special characters ("*",??), which are used to replace the letters while writing [7]; distance search - displays documents that contain keywords which are at a certain distance and it is activated by using a tilde sign ("~") [8]; "inaccurate search" - provides finding pages that can match the search argument, even if the latter is inaccurately identical with the information sought, for example, the inaccurate comparison system may perform the correction of mistakes made while typing [9]; contextual search - defines the meaning of a word depending on the context of a page, rather than a single word (inaccurate search) and is the basis of Crystal Semantics Textonomy [10]. In view of the conducted analysis, it can be assumed that the set of used approaches can not claim to be versatile and search optimization remains an actual task. Especially in conditions of increasing spread of semantic search technologies which requires the application of new approaches and methods. From this perspective, the research task is to develop a mathematical support for the optimization of Internet resources for semantic search using the algebra of algorithms. 1.2 The main tasks of the study and their significance The purpose of the research is to create a mathematical support for the process of search engine optimization of Internet resources using semantic search technologies. The conducted study will provide a means for promoting both existing and new online resources, give an increase in the number of targeted visitors, and hence increase in conversions. To achieve this goal, the following main tasks need to be solved: analyze the existing semantic search technologies used by search engines and identify the main tasks that arise in this process, synthesize the models that can be used to evaluate the content of the online resource for the purpose of similarity of texts, formulate recommendations which need to be followed in the process of search optimization using semantic search technology. The results of the research solve the actual problem of creating a mathematical support for the process of search engine optimization of Internet resources using semantic search technologies. 2 Major research results The term semantic search is used to describe the attempts made by search engines to understand user queries. However, it is much wider than the normal search and includes the context in which the user is at the moment the search query is entered. For example, if the latter enters the word "university", and the previous request was "Lviv", then there is a probability that he is looking for information about universities in Lviv. In addition, the essential condition for contemporary semantic search is the use of the concept of entities, which is to associate with people, events or places [11]. In other words, the Lviv Polytechnic National University is an object characterized by the address, the number of buildings, the variety of institutes, directions and specialties, etc. Therefore, in the case of a search on the request of Lviv Polytechnic National University, and then - "which current specialties", the search engine displays the specialty of this university in the search results. The need to use the concept of essences primarily involves the use of voice search technology in which an important feature is not only the "understanding" of the user's request, but also the definition of its "intention" in order to issue the most relevant result. In view of this, the approaches to optimizing Internet resources are radically changing. Namely, instead of writing content using competitive keywords, you first need to analyze the user's desire and create content that is relevant in nature and will answer the questions of the target audience [12]. Using this approach will facilitate the search engine promotion of the projected Internet resource not only for classical text queries, but also with the use of semantic search technologies. Semantic search emerged from the notion of a semantic network, which is mainly based on ontologies, which in the context of computer science defines a set of entities (classes, attributes, and relationships), through which the domain modeling is carried out. Since they do not depend on lower-level data models, ontologies are used in the process of integrating heterogeneous databases, which provides tools for analyzing specific queries based on the relationships of related factors [13]. A conducted analysis of Google's search engine showed that the content-handling methods used in spinning texts are relatively easily recognized by the search engine with the use of mechanisms: Latent Semantic Indexing (a method of indexing websites in which searches take into account the overall content of the text, and not its saturation with the keywords), latent Dirichlet allocation, which allows estimating the probability of the appearance of documents or terms beyond the text collection and for identifying the parameters in which the referencing and the term "frequency- inverce document frequency" is a term in which, for the purpose of determining the similarity between the texts for each pair, "the word of the current text is the text with which the comparison is made," the frequency of the occurrence of the word in the given text is found at the same time with the finding of the reverse frequency document). This indicates that it is impossible to apply classical approaches to search optimization by identifying the search statistical features of the repetition of words in a particular context and creating semantic correlations that are used in pertinent relevance technologies [14]. The conducted analysis of the mechanisms used by search engines to find relevant responses enables us to formulate an algorithm according to which it is possible to evaluate the content of an online resource for the purpose of comparing texts with other resources. In the future, the results obtained will be used in the process of constructing ontology concepts, which will provide tools for evaluating the content of the online resource in accordance with the requirements of semantic search. At the first stage, pre-processing of the text is carried out, namely its transformation into the form of the data vector. Further elaboration consists in carrying out a traning operation (cutting off endings and suffixes of words) and excluding non-informative phrases. The use of sedation methods is a widespread phenomenon in the global network and is widely used by search engines to evaluate the similarity of texts and the issuance of relevant information [8]. As of today, the literature describes a number of stemers that perform morphological analysis (Stemka, MyStem, Pymorphy) or do word clipping (Porter stemmer, Paice / Husk Stemmer), but in most cases they are localized in certain languages to which the Ukrainian does not belong. In view of this, it is proposed to use the an improved method described by Golub T. [15] as a statemer in this approach based on the modification of the Porter algorithm [16] and does not require the use of generated databases that reduces requirements for both hardware and to the number of calculations performed. The synthesis of the modified Porter Stemmer algorithm formula was carried out in three stages: synthesis of sequences, synthesis of eliminations and minimization of the algorithm [17,18]. Synthesis of sequences. It is necessary to describe the following uniterms: R - reading the word, N - translating the character into lowercase, D(a) - removing the apostrophe, D(s) - removing the part of the word from the vowel-consonant, D(z) - deleting the ending, D(g) - deleting the vowel, D(p) - removing one consonant, D(m) - deleting the soft character. The considered algorithm contains 30 sequences. Each of them describes the following processes: S1 - execution of the algorithm in the case of no apostrophe, ending, loud at the end of the word, duplicate vowels and soft sign, S2 - the same cases as in S1, except that the word is an apostrophe, S3 - in the word there is an end, S4 - in the word there is an apostrophe and an end, S5 - under all conditions described in S1, in the word there is only a vowel at the end of the word, S6 - similar to S5, but in the word there is an apostrophe, S7 - in the word there is ending and loud at the end of the word, S8 - in the word there is a vowel at the end, the ending and apostrophe, S9 - in the word there is only a double consonant, S10 - in the word there is a double consonant and apostrophe, S11 - in the word there is double the consonant and the ending, S12 - in the word there is a double consonant, the ending and the apostrophe, S13 - in the word there is an end, at the end it is loud and doubly consonant, S14 - in the word is the ending, at the end of the loud, doubly consonant and apostrophe, S15 - in the word is at the end of the loud and doubly consonant, S16 - similar to S15, except that the word is an apostrophe, S17 - in the word there is a soft sign in the end, S18 - in the word is a soft sign in the end and the apostrophe, S19 - in the word there is an end and a soft sign, S20 - in the word there are endings, S21 - in the word there is a soft sign in the end, vowel in the end, S22 - in the word there is a soft sign at the end, vowel at the end and an apostrophe, S23 - in the word there is a double consonant and m ' which sign is at last S24 - in the word is double consonant, soft sign at the end and apostrophe, S25 - in the word there is an ending, loud at the end, doubly consonant and soft sign in the end, S26 - in the word there are all cases, S27 - in the word is loud at the end, doubly consonant and soft sign in the end, S28 - in the word there is a loud end, doubly consonant, soft sign at the end and the apostrophe, S29 - in the word there is an ending, doubly consonant and soft sign in the end, S30 - in the word there is an ending, doubly consonant, a soft sign at the end and an apostrophe. Below are the following sequences. After completing the substitution of the sequences and minimizing the algorithm, we obtain the following formula of the modified Porter stemmer algorithm: The next stage involves removing the so-called stop words from the generated vector. Stop words are words that do not carry a content load but without them it is impossible to construct meaningful content. These include prepositions, pronouns, exclamations, punctuation marks, etc. [19]. As search engines are continuously improving, the word-of-mouth-recordings change as well, given the fact that a constant condition for updating and calculating their relation to the total number of content words. A significant number of stop words in the text negatively reflects on its evaluation by the user and creates the effect of meaningless content. The reverse situation, when the text includes not enough stop-words (creation of content oriented solely on search engines) also negatively affects readability and provokes the lack of interest in the user. The next step is to determine the similarity of this text with the standard. To determine the degree of similarity between texts, it is proposed to use the statistical measure TF-IDF [10], which determines the frequency of occurrence of the word in this content. Next, the selection of the most meaningful words (key words) in the content is formed, which form the object, subject and predicate with the formation of possible patterns of searches / answers [20-23]. In this case, the words found will be displayed as an ordered list with links to the text paragraphs where they occur. In order to deduce the complete information at this step, the word lexeme is displayed, indicating the objects and indicating the concept of them. The lexical value reflects distinctive, individual features of the subject. It is proposed them to be output by displaying the original with automatic positioning on the text fragment found and keyword selection. The final stage of the algorithm work is the construction of an ontology, in which the definition of classes and their hierarchy is carried out. Next, the properties of each class, the restrictions and types of properties values are determined. The result of this step is the set of concepts and relationships in the form of triplets that conforms to the RDF (Schema) standard and provides the ability to translate them into the OWL language [13]. For the convenience of evaluating the construction of the ontological model, it is proposed to implement using the OntoViz module. Upon completion of the construction, it is proposed to use the FaCT ++ consideration module to identify possible non-conformities in the ontology and to compare it (in the long run) with the linguistic base for the Ukrainian language (Ukrainian WordNet) [24], which will help to assess the completeness of content coverage and its relevance to the requirements of semantic search. To synthesize the formula for a search engine optimization resource under a semantic search, one must describe the following uniterms: F(v) - create a data vector, F(s) - perform a traning operation, F(d) - remove stop words, F(mS) - measure the size similarity, F(kL) - forming a list of key and displaying tokens, F(o) - constructing the model's ontology and visualization, and F(kk) creating "useful content". Linear actions are described in sequences S1 and S2: On the condition of checking whether the permissible value of the ratio of total content to the stop words, these sequences are eliminated by elimination of L1. After substituting the corresponding sequences in the elimination, we obtain the following formula of the algorithm: As a result of minimizing the algorithm formula by the number of uniterms, we obtain the formula for search optimization of the resource under the semantic search. As you can see, semantic search is extremely important in the process of conducting an SEO company. In view of this, the analysis of known strategies made it possible to formulate recommendations that should be followed in the process of search engine optimization using semantic search technologies: Creating quality content. Modern search engines implement artificial intelligence methods in order to provide a possible dialogue with the user. To perform this function, they need a large array of information, landmarks, expert content. From this point of view, it is necessary to create authoritative content in the relevant subject area, to become a source of expert information so that search engines can refer to a popularized resource. Orientation to the answer. A necessary condition is content creation focusing on questions / answers. The research conducted showed that search engines prefer to display information in the form of numbered lists or step-by-step instructions that respond to users’ questions and begin with the words "how to do", "why", "what," and so on. Technical structuring of content. Structuring data for markup is to annotate the pages of the online resource, making them understandable to search engines. Using structured markup not only gives search engines the opportunity to better understand the content, but also improves search quality by displaying results in a snippet (zero position) that gives the user additional information about the content on the page and improves the Click-through rate using semantic search. It is advisable to verify the technical structure using the Structured Data Testing Tool. Use of internal links. Internal links played and continue to play a significant role in creating a positive user experience by providing navigation as an online resource. In doing so, you need to link landing pages, add contextual links to important content elements, prevent the occurrence of broken links, etc. 3 Conclusion As a result of the research, the existing semantic search technologies used by search engines are analyzed and the main problems that arise here are identified. Finished synthesis of models according to which it is possible to evaluate the content of the online resource for the similarity of texts with other resources and describes the process of forming ontology concepts to evaluate the possibilities of semantic information search. Unlike the classical tools, it provides the means to minimize them by the number of uniterms and study the corresponding mathematical models. The recommendations are to be followed in the process of search engine optimization using semantic search technologies. Further research will focus on the design of relevant software units, their verification and adaptation to the operation of the global network. References 1. Grappone, J.: Search Engine Optimization (SEO): An Hour a Day. In: United States, Wiley Publishing. (2013) 2. Su, J., Sachenko, A., Lytvyn, V., Vysotska, V., Dosyn, D.: Model of Touristic Information Resources Integration According to User Needs. In: International Scientific and Technical Conference on Computer Sciences and Information Technologies, 113-116 (2018) 3. Ovsyak, V.: Algorithms: methods of construction, optimization, probability research. In: Lviv, Svit. (2001) (In Ukrainian) 4. Basyuk, T.: Popularization of website and without anchor promotion. In: International Scientific and Technical Conference on Computer Science and Information Technologies (CSIT), 193-195 (2016) 5. Basyuk, T.: Innerlinking website pages and weight of links. In: International Scientific and Technical Conference on Computer Science and Information Technologies (CSIT), 12-15 (2017) 6. Amerland, D.: Google Semantic Search: Search Engine Optimization (SEO) Techniques That Get Your Company More Traffic, Increase Brand Impact, and Amplify Your Online Presence. In: United States, Que Publishing. (2013) 7. Vysotska, V., Basto Fandes, V., Lytvyn, V., Emmerich, M., Hrendus, M.: Method for Determining Linguometric Coefficient Dynamics of Ukrainian Text Content Authorship. In: International Conference on Computer Science and Information Technologies (CSIT), 132-151 (2019) 8. Najman, L., Talbot, H.: Mathematical Morphology: From Theory to Applications. In: United Kingdom, Wiley-ISTE. (2010) 9. Frank, Y.: Shih Image Processing and Mathematical Morphology: Fundamentals and Applications. In: United States, CRC Press. (2009) 10. Jones, K.: A statistical interpretation of term specificity and its application in retrieval. In: Journal of Documentation, vol. 60(5), 493-502 (2004) 11. Basyuk, T.: The Popularization Problem of Websites and Analysis of Competitors. In: Advances in Intelligent Systems and Computing II (CSIT), vol. 689, 54-65. (2018) 12. Bailin, A., Grafstein, A.: Readability: Text and Context. In: Palgrave Macmillan. (2016) 13. Gaševic, D., Djuric, D., Devedžic, V., Selic, B., Bézivin, J.: Model Driven Engineering and Ontology Development. In: Springer. (2009) 14. Bast, Н., Buchhold, B., Haussmann E.: Semantic Search on Text and Knowledge Bases (Foundations and Trends in Information Retrieval). In: US, Now Publishers Inc. (2016) 15. Golub, T., Tyagunova, Yu.: The method of Ukrainian language stitemming for the classification of documents based on Porter's algorithm. In: Scientific works of the Donetsk National Technical University, vol. 1, 59-63 (2017) (In Ukrainian) 16. Porter, M.: An algorithm for suffix stripping Program. In: Data Technologies and Application, vol. 40(3), 211-218 (2006) 17. Vysotska, V., Fernandes, V.B., Emmerich, M.: Web content support method in electronic business systems. In: CEUR Workshop Proceedings, Vol-2136, 20-41 (2018) 18. Vysotska, V., Hasko, R., Kuchkovskiy, V.: Process analysis in electronic content commerce system. In: Proceedings of the International Conference on Computer Sciences and Information Technologies, CSIT 2015, 120-123 (2015) 19. Vysotska, V., Kanishcheva, O., Hlavcheva, Y.: Authorship Identification of the Scientific Text in Ukrainian with Using the Lingvometry Methods. In: International Conference on Computer Science and Information Technologies (CSIT), 34-38 (2018) 20. Basyuk, T.: Popularization of Internet resources by using ”featured snippets”. In: International conference System Analysis and Information Technology, 190–191 (2018) 21. Korobchinsky, M., Vysotska, V., Chyrun, L., Chyrun, L.: Peculiarities of Content Forming and Analysis in Internet Newspaper Covering Music News, In: Computer Science and Information Technologies, Proc. of the Int. Conf. CSIT, 52-57 (2017) 22. Kanishcheva, O., Vysotska, V., Chyrun, L., Gozhyj, A.: Method of Integration and Content Management of the Information Resources Network. In: Advances in Intelligent Systems and Computing, 689, Springer, 204-216 (2018) 23. Naum, O., Chyrun, L., Kanishcheva, O., Vysotska, V.: Intellectual System Design for Content Formation. In: Computer Science and Information Technologies, Proc. of the Int. Conf. CSIT, 131-138 (2017) 24. Anisimov, A., Marchenko, O., Nikonenko, A., Porkhun, E., Taranukha, V.: Ukrainian WordNet: Creation and Filling. In: International Conference on Flexible Query Answering Systems (FQAS), 649-660. (2013)