Using a logical Derivative to Determine the Information Content of Object Properties in Speech Recognition Tasks a,b Larisa Lyutikova a Institute of Applied Mathematics and Automation KBSC RAS, 360000, KBR, Nalchik, str Shortanova. 89A b Institute for Computer Science and Problems of Regional Management KBSC RAS, 360000, KBR, Nalchik, st. I. Armand, 37A. Abstract This paper offers an approach for determining the information content of object properties in recognition tasks. The scope of this approach is not the subject area where objects and characteristics of these objects are specified, but a trained - neural network that works correctly on a given subject area. In this paper, we propose a method for constructing a decision function based on the weight characteristics of a correctly functioning - neuron. A logical derivative is used to evaluate the significance of object characteristics. This makes it possible to track how the decision function will change its value if one or more object characteristics change their value. This will allow us to draw a conclusion about the most important properties of the subject area under consideration. Keywords 1 Decisive function, Boolean derivative, analysis of the data; the algorithm - -neuron; decision trees; corrective surgery. 1. Introduction The research area of this work is related to such scientific direction as pattern recognition, the purpose of which is to classify objects into several categories or classes. The practicality of this method is obvious, since after classification, working with a certain class of information requires fewer resources than working with its full volume. In practice, solving problems related to pattern recognition is a complex theoretical and practical task. This is primarily because each specific case has its own specifics, which does not allow creating a universal algorithm for working with information. Today, neural networks are one of the most popular tools for solving problems for which there is no acceptable mathematical model or exact algorithm, but there are many so-called heuristics, based on which they try to get more or less accurate patterns inherent in the studied subject area [4,5]. The data used to find patterns are not perfect, they are usually incomplete, have a lot of inaccuracies and distortions. Despite the fact that neural networks do a good job with a great variety of such tasks, the rules of their decision-making are not clear to the user. Only the structure and weight characteristics that the neural network acquired because of training are available. Identifying logical connections based on the characteristics of a correctly functioning neural network is an important task, since neural networks are built on heuristics, and their solutions can be ambiguous. It also takes many cycles to build object models, which entails long time costs, and training can lead to a dead end. The problem of retraining is acute. To solve these problems, corrective methods are important, since they provide an opportunity to gain new knowledge about the patterns in the subject area under study. This will further help you YRID-2020: International Workshop on Data Mining and Knowledge Engineering, October 15-16, 2020, Stavropol, Russia EMAIL: lylarisa@yandex.ru (Larisa Lyutikova) ORCID: 0000-0003-4941-7854 (Larisa Lyutikova) ©️ 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 77 better understand the nature of the data being studied, and therefore develop algorithms that are more expressive To date, the main methods of pattern recognition have been identified. There are four main methods: the method of comparison with the standard, the statistical method, the method using artificial neural networks (ins), and the structural method [2, 4]. However, these methods have a number of obvious disadvantages. The method of comparison with the standard is accompanied by distortions of the samples under consideration, which requires taking into account a considerable number of cases of small deviations from the standard. Statistical methods face a number of difficulties in solving practical problems, since samples from each class are often not representative enough, and probability distribution density functions are difficult to construct. Neural networks are characterized by long-term training based on the consideration of many examples. Structural methods are sensitive to distortions of recognized images and require a complex procedure for constructing a set of features. This situation leads to the need to combine these methods in order to compensate for the shortcomings of some approaches with the advantages of others, which is actively used in solving practical problems. 2. The purpose and objectives of the study This paper offers a combined approach to solving problems of pattern recognition, which makes it possible to correct the operation of the neural network. In cases of an already trained neural network, even if the subject area is not known, based only on the weight characteristics of the neuron, get logical connections in the data under study, build a logical solver function for a given area, and thus go to structural methods of solving. Use the Boolean derivative to identify the most important characteristics for each object and for the subject area as a whole. This means getting a more correct picture of the area under study, and therefore the ability to work with it For example, if we are talking about pattern recognition, this approach makes it possible to build a logical classifier function that can correct the operation of this network using only the weights of the neural network. Further investigation of this function will reveal the most important features for each object. You can also show how changing the values of certain attributes results in changes in objects. 3. Research methods Such authors describe methods of teaching Sigma-PI neuron as A.V. Timofeev, Z. M. Shibzukhov. A.V. Timofeev proposed training the network in one pass. The construction of the classifier function will be based on the study of the weight values of the -neural network. As a solver (a function or method that divides the subject area into classes), we will consider the - neuron (Sigma-PI neuron), which is a more expressive generalization of classical neural networks, since it corresponds more formally to natural neurons. It can be represented by the following linear function of the input signals sp( x1 ,..., xn ) Or: sp( x1 ,..., xn )   wi  xi , {w1 , w2 ,..., wk } - the coefficients, which becomes a neuron in the learning process. If the subject is a  - neuron, it recognizes k elements from the specified subject area Y  { y1 , y2 ,..., yk } generated by the corresponding set of attributes { X 1 ,..., X k } [1]. EXAMPLE. Suppose the area for learning a Sigma PI neuron is represented by the following set of features and objects: 78 Table 1. Example. x1 x2 x3 y 0 0 1 а (2) 0 1 1 в(4) 1 1 0 с(6) Attributes are characterized by corresponding codes and are represented by a set with values: x = {𝑥1 = (0,0,1), 𝑥1 = (0,1,1), 𝑥1 = (1,1,0)} The object is {а, 𝑏, 𝑐} which are encoded accord accordingly 𝑎 − 2, 𝑏 − 4, с − 6. Encoding objects with numerical values is necessary for fast training of this neural network. After training, which is performed in one pass, the  - neuron will have the form: 𝑠𝑝(𝑥1 𝑥2 𝑥3 ) = 2𝑥3 + 2𝑥2 𝑥3 + 4𝑥2 𝑥1 Any query (𝑥1 , 𝑥2 , 𝑥3 ), hat is represented in the table will be identified with its corresponding object.. If the query does not match the values of the variables that are in the training sample, for example (0,1,0), the result may not be correct or it may not be at all. 𝑠𝑝(0,1,0) = 2 ∗ 0 + 2 ∗ 1 ∗ 0 + 4 ∗ 0 ∗ 0 = 0. The network doesn't recognize any elements. Because an object with exactly the same characteristics is not in the data. However, it could be an object, b4, or C-6, if there are inaccuracies, noise, or interference in the data. Therefore, to obtain a possible class of solutions, the trained neuron requires additional corrective approaches. 4. Construction of a crucial logical function based on the structure of a - neuron and identification of significant features When constructing a decision function, you don't need to know the training sample.it is enough to know the value of the weights and the structure of the  - neuron. The function is constructed using a tree whose construction algorithm is described in [1]. The number of levels is equal to the largest number of products of variables in each of the terms +1. In the example, there will be 3 of them. Arrange the variables {x1 , x2 ,..., xn } to the lower levels of the tree. The second layer is the coefficients before the terms with one variable; the third is the coefficients before the terms with two variables, and so on. The value of each node will be considered as yk 1  wk 1  yi , where are the indexes of the corresponding objects whose variables are included as factors in the element on yk 1. EXAMPLE. Let us build a tree for defining the main properties of objects. 𝑠𝑝(𝑥1 𝑥2 𝑥3 ) = 2𝑥3 + 2𝑥2 𝑥3 + 4𝑥2 𝑥1 Because of building a tree from the training sample, you can identify the basic rules for the relationship between objects and their characteristics (see figure 1). 79 Figure 1: Decision Tree for a given subject area The connection object yk with each property xi : P( yk ) & P( yk 1 ) &...& P( yi ) & xi 𝑃(𝑦𝑘 ) = 1 if у = 𝑦𝑘 𝑃(𝑦𝑘 ) = 0, if у ≠ 𝑦𝑘 For this example, the minimum set of rules will look like: 𝐹(𝑥1 𝑥2 𝑥3 ) = 𝑃(6)𝑥1 ∨ 𝑃(6)𝑃(4)𝑥2 ∨ 𝑃(2)𝑃(2)𝑥3 These rules are sufficient if only the presence of the attribute is important for the data under consideration. However, these rules are not enough if the value of the variable zero is also informative for making decisions. In addition, these rules are not enough in the case of multi-value encoding. Therefore, there is a need to build additional trees, or imaginary paths in the figure this is indicated with a dash-dotted line. For example, it looks like in figure 2. Figure 2: Object part of a logical function If an item doesn't have any attribute, the path to it will be marked with a dashed line. Then the decision function for our example will look like this: 𝐹(𝑥1 𝑥2 𝑥3 ) = 𝑃(6)𝑥1 𝑥̅3 ∨ 𝑃(6)𝑃(4)𝑥2 ∨ 𝑃(4)𝑃(2)𝑥3 𝑥̅1 To identify significant variables in this function, i.e. the most important features for the source data, we will use the logical derivative. Logical differentiation in this case implies differentiation of Boolean functions, which in some sense is analogous to classical differentiation [7, 8]. In the semantic meaning, the Boolean derivative can show the degree of dependence of a function on a given variable, and indicate how justified the expectations of changing the values of the function are in cases of changing the value of the variable. f Definition 1.The Boolean derivative from a Boolean function 𝑓(𝑥1 , … , 𝑥𝑛 ) by the variable xi xi we will call the sum modulo 2 of the corresponding residual functions: f  f ( x1 ,..., xi 1 ,0, xi 1 ,..., xn )  f ( x1 ,..., xi 1 ,1, xi 1 ,..., xn ) xi 80 f Definition 2. The weight of the derivative of a Boolean function Р ( ) the number ("1") in the xi column of values of the derivative is called Statement 1. The weight of the derivative for a given variable shows how much the function 𝑓(𝑥1 , . . . , 𝑥𝑛 ) depends on the variable xi in comparison with other variables. Definition 3. Expression of the form: k f   k 1 f  ( ); ( x1...xk ) xk x1...xk 1 it is called the mixed derivative of the k-th order of the corresponding variables. However, the order of the fixed variable does not matter. A mixed Boolean derivative with respect to k-variables indicates the conditions under which a function changes its value while simultaneously changing the values of 𝑥1 , . . . , 𝑥𝑘 . If it is necessary to determine the most important characteristics of the objects under consideration in a given subject area after the logical function solver is constructed, you can consider the Boolean derivatives for each variable and thus select the most significant properties. For our example: 𝑓(Х) = 𝑃(6)𝑥̅3 ∨ 𝑃(6)𝑃(4)𝑥2 ⊕ 𝑃(6)𝑃(4)𝑥2 ∨ 𝑃(4)𝑃(2)𝑥3 = 𝑃(4)𝑃(2)𝑥3 ∨ 𝑃(6)𝑥̅3 𝑑х1 The resulting derivative can classify objects by the variable 𝑥3 . Derivative of the variable 𝑥2 𝑓(Х) 𝑑х3 = 𝑃(6)𝑥̅3 𝑥1 ∨ 𝑃(2)𝑃(4)𝑥̅1 𝑥3 ⊕ 𝑃(2)𝑃(4)𝑥̅1 𝑥3 ∨ 𝑃(6)𝑥̅3 𝑥1 ∨ 𝑃(4)𝑃(6) = 𝑃(6)𝑃(4)𝑥̅3 ̅̅̅ 𝑥1 ∨ 𝑃(6)𝑃(4)𝑥1 𝑥3, This result gives conflicting data about only two objects, and makes it impossible to classify them. Therefore, it can be argued that the variable 𝑥2 reflects the most important properties for the data under study. And the variables 𝑥3 and 𝑥1 are dependent, i.e. they are ensemble variables 𝑥1 = ̅̅̅. 𝑥3 Because of the analysis, we can say that knowing the weight values of the neural network, we can build a logical function that reflects the patterns in the data. These patterns may not be obvious when using conventional domain analysis methods. The study of the obtained logical function can reveal the most significant features for each object, and for the considered data as a whole. 5. Comparisons with well-known attention methods in neural networks The successfully implemented Attention technology is a way to tell the network what to pay more attention to, that is, to report the probability of a particular outcome depending on the state of neurons and incoming data. The Attention layer implemented in Keras itself identifies factors based on the training sample that reduce network error. Identification of important factors is performed through the method of error back propagation, just as it is done for convolutional networks. When we train a network on data, the importance becomes a function of the probability of an outcome depending on the data received by the network. In this method, it is proposed to identify logical connections between the objects of the training sample, without the training sample itself, which was used to form the analyzed neural network. In contrast to the Attention method, the proposed approach identifies logical rather than statistical patterns in the data. Allows you to avoid repeating the error back propagation procedure many times. Knowledge of logical patterns, as opposed to statistical ones, will allow you to formalize and more accurately understand the nature of the analyzed data. This approach is relevant in cases when we do not have a training sample with large amounts of data (rare events that are difficult to reproduce), or when processing large amounts of data is faced with the necessary capacity. 81 6. Conclusion A large number of methods are used to create systems capable of obtaining knowledge from data. In this paper, we consider an approach that allows, knowing only the value of the neural network weights, to find patterns in the data and build a logical function that reflects these patterns. The proposed analysis of the sensitivity of this function by the methods of the logical derivative allows one to formalize the process of finding the coefficients of importance for the characteristics of the properties of objects. This is important when the data is incomplete, fuzzy, or distorted due to information noise. All this leads to the development of methods for more accurate solution of intellectual problems. This work was supported by the RFBR grant No. 19-01-00648-a 7. References [1] Lyutikova L. Sigma-Pi Neural Networks: Error Correction Methods", published in Procedia Computer Science, Volume 145, 2018, Pages 312-318 [2] Alex Graves, Greg Wayne, Malcolm Reynolds, Tim Harley, Ivo Danihelka, Agnieszka GrabskaBarwinska, Sergio Gómez Colmenarejo, Edward Grefenstette, Tiago Ramalho, John Agapiou, et al. Hybrid computing using a neural network with dynamic external memory. Nature, 2016.№ 538(7626). Pp.471–476. [3] Ashley I. Naimi, Laura B. Balzer Stacked generalization: an introduction to super learning // European Journal of Epidemiology (2018) 33:459–464. [4] Fan Yang Zhilin Yang William W. Cohen Differentiable Learning of Logical Rules for Knowledge Base Reasoning // Advances in Neural Information Processing Systems. Volume 2017-December, 2017, Pages 2320-2329. [5] Peter Flach Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge University Press, 2012. — 396 p. — ISBN: 978-1107096394. [6] Rahman Akhlaqur & Tasnim Sumaira. Ensemble Classifiers and Their Applications: A Review // International Journal of Computer Trends and Technology. (2014).Vol. 10. No1. Pp.31 – 35. [7] Dyukova Ye.V., Zhuravlev YU.I., Prokof'yev P.A. Metody povysheniya effektivnosti logicheskikh korrektorov // Mashinnoye obucheniye i analiz dannykh. 2015. T. 1. № 11. S. 1555-1583. [8] Lyutikova L. A., Shmatova E. V. Analysis and synthesis of pattern recognition algorithms using variable-valued logic // "Information Technologies". Volume 22. No. 4. 2016.S. 292-297. 82