Using a logical Derivative to Determine the Information Content of Object Properties in Speech Recognition Tasks

Using a logical Derivative to Determine the Information Content of Object Properties in Speech Recognition Tasks LarisaLyutikova Institute of Applied Mathematics and Automation KBSC RAS Shortanova. 89A KBR

360000 Nalchik

Institute for Computer Science and Problems of Regional Management KBSC RAS KBR

Armand 360000, 37A Nalchik st. I

Using a logical Derivative to Determine the Information Content of Object Properties in Speech Recognition Tasks 300BD3F003D947C871BB679273457ED6 GROBID - A machine learning software for extracting information from scholarly documents Decisive function, Boolean derivative, analysis of the data the algorithm --neuron decision trees corrective surgery

This paper offers an approach for determining the information content of object properties in recognition tasks. The scope of this approach is not the subject area where objects and characteristics of these objects are specified, but a trained neural network that works correctly on a given subject area. In this paper, we propose a method for constructing a decision function based on the weight characteristics of a correctly functioning neuron. A logical derivative is used to evaluate the significance of object characteristics. This makes it possible to track how the decision function will change its value if one or more object characteristics change their value. This will allow us to draw a conclusion about the most important properties of the subject area under consideration.

Introduction

The research area of this work is related to such scientific direction as pattern recognition, the purpose of which is to classify objects into several categories or classes. The practicality of this method is obvious, since after classification, working with a certain class of information requires fewer resources than working with its full volume. In practice, solving problems related to pattern recognition is a complex theoretical and practical task. This is primarily because each specific case has its own specifics, which does not allow creating a universal algorithm for working with information.

Today, neural networks are one of the most popular tools for solving problems for which there is no acceptable mathematical model or exact algorithm, but there are many so-called heuristics, based on which they try to get more or less accurate patterns inherent in the studied subject area [4,5].

The data used to find patterns are not perfect, they are usually incomplete, have a lot of inaccuracies and distortions.

Despite the fact that neural networks do a good job with a great variety of such tasks, the rules of their decision-making are not clear to the user. Only the structure and weight characteristics that the neural network acquired because of training are available.

Identifying logical connections based on the characteristics of a correctly functioning neural network is an important task, since neural networks are built on heuristics, and their solutions can be ambiguous. It also takes many cycles to build object models, which entails long time costs, and training can lead to a dead end. The problem of retraining is acute.

To solve these problems, corrective methods are important, since they provide an opportunity to gain new knowledge about the patterns in the subject area under study. This will further help you better understand the nature of the data being studied, and therefore develop algorithms that are more expressive To date, the main methods of pattern recognition have been identified. There are four main methods: the method of comparison with the standard, the statistical method, the method using artificial neural networks (ins), and the structural method [2,4]. However, these methods have a number of obvious disadvantages. The method of comparison with the standard is accompanied by distortions of the samples under consideration, which requires taking into account a considerable number of cases of small deviations from the standard. Statistical methods face a number of difficulties in solving practical problems, since samples from each class are often not representative enough, and probability distribution density functions are difficult to construct. Neural networks are characterized by long-term training based on the consideration of many examples. Structural methods are sensitive to distortions of recognized images and require a complex procedure for constructing a set of features.

This situation leads to the need to combine these methods in order to compensate for the shortcomings of some approaches with the advantages of others, which is actively used in solving practical problems.

The purpose and objectives of the study

This paper offers a combined approach to solving problems of pattern recognition, which makes it possible to correct the operation of the neural network. In cases of an already trained neural network, even if the subject area is not known, based only on the weight characteristics of the neuron, get logical connections in the data under study, build a logical solver function for a given area, and thus go to structural methods of solving. Use the Boolean derivative to identify the most important characteristics for each object and for the subject area as a whole. This means getting a more correct picture of the area under study, and therefore the ability to work with it For example, if we are talking about pattern recognition, this approach makes it possible to build a logical classifier function that can correct the operation of this network using only the weights of the neural network. Further investigation of this function will reveal the most important features for each object. You can also show how changing the values of certain attributes results in changes in objects.

Research methods

Such authors describe methods of teaching Sigma-PI neuron as A.V. Timofeev, Z. M. Shibzukhov. A.V. Timofeev proposed training the network in one pass.

The construction of the classifier function will be based on the study of the weight values of the -neural network.

As a solver (a function or method that divides the subject area into classes), we will consider the neuron (Sigma-PI neuron), which is a more expressive generalization of classical neural networks, since it corresponds more formally to natural neurons. It can be represented by the following linear function of the input signals EXAMPLE. Suppose the area for learning a Sigma PI neuron is represented by the following set of features and objects:

Table 1. Example. 1 x 2 x 3 x y 0 0 1 а (2) 0 1 1 в(4) 1 1 0 с(6)

Attributes are characterized by corresponding codes and are represented by a set with values:

x = {𝑥 1 = (0,0,1), 𝑥 1 = (0,1,1), 𝑥 1 = (1,1,0)} The object is {а, 𝑏, 𝑐} which are encoded accord accordingly 𝑎 − 2, 𝑏 − 4, с − 6. Encoding objects with numerical values is necessary for fast training of this neural network. After training, which is performed in one pass, the  -neuron will have the form: 𝑠𝑝(𝑥 1 𝑥 2 𝑥 3 ) = 2𝑥 3 + 2𝑥 2 𝑥 3 + 4𝑥 2 𝑥 1 Any query (𝑥 1 , 𝑥 2 , 𝑥 3 ), hat is represented in the table will be identified with its corresponding object.. If the query does not match the values of the variables that are in the training sample, for example (0,1,0), the result may not be correct or it may not be at all. 𝑠𝑝(0,1,0) = 2 * 0 + 2 * 1 * 0 + 4 * 0 * 0 = 0. The network doesn't recognize any elements. Because an object with exactly the same characteristics is not in the data. However, it could be an object, b4, or C-6, if there are inaccuracies, noise, or interference in the data.

Therefore, to obtain a possible class of solutions, the trained neuron requires additional corrective approaches.

Construction of a crucial logical function based on the structure of a neuron and identification of significant features

When constructing a decision function, you don't need to know the training sample.it is enough to know the value of the weights and the structure of the  -neuron. The function is constructed using a tree whose construction algorithm is described in [1].

The number of levels is equal to the largest number of products of variables in each of the terms +1. In the example, there will be 3 of them.

Arrange the variables 12 { , ,..., } n x x x to the lower levels of the tree. The second layer is the coefficients before the terms with one variable; the third is the coefficients before the terms with two variables, and so on. The value of each node will be considered as EXAMPLE. Let us build a tree for defining the main properties of objects.

𝑠𝑝(𝑥 1 𝑥 2 𝑥 3 ) = 2𝑥 3 + 2𝑥 2 𝑥 3 + 4𝑥 2 𝑥 1

Because of building a tree from the training sample, you can identify the basic rules for the relationship between objects and their characteristics (see figure 1). However, these rules are not enough if the value of the variable zero is also informative for making decisions. In addition, these rules are not enough in the case of multi-value encoding.

Therefore, there is a need to build additional trees, or imaginary paths in the figure this is indicated with a dash-dotted line.

For example, it looks like in figure 2.

Figure 2: Object part of a logical function

If an item doesn't have any attribute, the path to it will be marked with a dashed line.

Then the decision function for our example will look like this: 𝐹(𝑥 1 𝑥 2 𝑥 3 ) = 𝑃(6)𝑥 1 𝑥̅ 3 ∨ 𝑃( 6)𝑃( 4)𝑥 2 ∨ 𝑃( 4)𝑃( 2)𝑥 3 𝑥̅ 1 To identify significant variables in this function, i.e. the most important features for the source data, we will use the logical derivative.

Logical differentiation in this case implies differentiation of Boolean functions, which in some sense is analogous to classical differentiation [7,8]. In the semantic meaning, the Boolean derivative can show the degree of dependence of a function on a given variable, and indicate how justified the expectations of changing the values of the function are in cases of changing the value of the variable.

Definition 1.The Boolean derivativei f x  

from a Boolean function 𝑓(𝑥 1 , … , 𝑥 𝑛 ) by the variable xi we will call the sum modulo 2 of the corresponding residual functions:

1 1 1 11 1

( ,..., ,0, ,..., ) ( ,..., ,1, ,..., )

i i n i i n i f f x x x x f x x x x x        Definition 2.

The weight of the derivative of a Boolean function ()

i f Р x  

the number ("1") in the column of values of the derivative is called Statement 1. The weight of the derivative for a given variable shows how much the function 𝑓(𝑥 1 , . . . , 𝑥 𝑛 ) depends on the variable xi in comparison with other variables. Definition 3. Expression of the form:

1 1 1 1 () ( ... ) ... kk k k k ff x x x x x           ;

it is called the mixed derivative of the k-th order of the corresponding variables. However, the order of the fixed variable does not matter.

A mixed Boolean derivative with respect to k-variables indicates the conditions under which a function changes its value while simultaneously changing the values of 𝑥 1 , . . . , 𝑥 𝑘 .

If it is necessary to determine the most important characteristics of the objects under consideration in a given subject area after the logical function solver is constructed, you can consider the Boolean derivatives for each variable and thus select the most significant properties.

For 6)𝑃( 4)𝑥 1 𝑥 3 , This result gives conflicting data about only two objects, and makes it impossible to classify them. Therefore, it can be argued that the variable 𝑥 2 reflects the most important properties for the data under study.

And the variables 𝑥 3 and 𝑥 1 are dependent, i.e. they are ensemble variables 𝑥 1 = 𝑥 3 ̅̅̅. Because of the analysis, we can say that knowing the weight values of the neural network, we can build a logical function that reflects the patterns in the data. These patterns may not be obvious when using conventional domain analysis methods. The study of the obtained logical function can reveal the most significant features for each object, and for the considered data as a whole.

Comparisons with well-known attention methods in neural networks

The successfully implemented Attention technology is a way to tell the network what to pay more attention to, that is, to report the probability of a particular outcome depending on the state of neurons and incoming data. The Attention layer implemented in Keras itself identifies factors based on the training sample that reduce network error. Identification of important factors is performed through the method of error back propagation, just as it is done for convolutional networks. When we train a network on data, the importance becomes a function of the probability of an outcome depending on the data received by the network.

In this method, it is proposed to identify logical connections between the objects of the training sample, without the training sample itself, which was used to form the analyzed neural network. In contrast to the Attention method, the proposed approach identifies logical rather than statistical patterns in the data. Allows you to avoid repeating the error back propagation procedure many times. Knowledge of logical patterns, as opposed to statistical ones, will allow you to formalize and more accurately understand the nature of the analyzed data. This approach is relevant in cases when we do not have a training sample with large amounts of data (rare events that are difficult to reproduce), or when processing large amounts of data is faced with the necessary capacity.

Conclusion

A large number of methods are used to create systems capable of obtaining knowledge from data. In this paper, we consider an approach that allows, knowing only the value of the neural network weights, to find patterns in the data and build a logical function that reflects these patterns.

The proposed analysis of the sensitivity of this function by the methods of the logical derivative allows one to formalize the process of finding the coefficients of importance for the characteristics of the properties of objects. This is important when the data is incomplete, fuzzy, or distorted due to information noise.

All this leads to the development of methods for more accurate solution of intellectual problems. This work was supported by the RFBR grant No. 19-01-00648-a

1 (1coefficients, which becomes a neuron in the learning process.If the subject is a  -neuron, it recognizes k elements from the specified subject area 12 { , ,..., } k Y y y y  generated by the corresponding set of attributes 1 { ,..., } k XX [1].

where are the indexes of the corresponding objects whose variables are included as factors in the element on 1 .

Figure 1 :1Figure 1: Decision Tree for a given subject area

The resulting derivative can classify objects by the variable 𝑥 3 .our example:𝑓(Х) 𝑑х 1= 𝑃(6)𝑥̅ 3 ∨ 𝑃(6)𝑃(4)𝑥 2 ⊕ 𝑃(6)𝑃(4)𝑥 2 ∨ 𝑃(4)𝑃(2)𝑥 3 = 𝑃(4)𝑃(2)𝑥 3 ∨ 𝑃(6)𝑥̅ 3Derivative of the variable 𝑥 2 𝑓(Х) 𝑑х 3 = 𝑃(6)𝑥̅ 3 𝑥 1 ∨ 𝑃(2)𝑃(4)𝑥̅ 1 𝑥 3 ⊕ 𝑃(2)𝑃(4)𝑥̅ 1 𝑥 3 ∨ 𝑃(6)𝑥̅ 3 𝑥 1 ∨ 𝑃(4)𝑃(6) =𝑃(6)𝑃(4)𝑥̅ 3 𝑥 1 ̅̅̅ ∨ 𝑃(

Sigma-Pi Neural Networks: Error Correction Methods LLyutikova published in Procedia Computer Science 145 2018 Hybrid computing using a neural network with dynamic external memory AlexGraves GregWayne MalcolmReynolds TimHarley IvoDanihelka AgnieszkaGrabskabarwinska SergioGómez EdwardGrefenstette TiagoRamalho JohnAgapiou Nature 2016. 7626 Balzer Stacked generalization: an introduction to super learning AshleyINaimi LauraB European Journal of Epidemiology 33 2018 Cohen Differentiable Learning of Logical Rules for Knowledge Base Reasoning // Advances in Neural Information Processing Systems FanYang Zhilin YangWilliam W 2017. December, 2017 Machine Learning: The Art and Science of Algorithms that Make Sense of Data PeterFlach 2012 Cambridge University Press 396 Ensemble Classifiers and Their Applications: A Review RahmanAkhlaqur TasnimSumaira International Journal of Computer Trends and Technology 10 2014 Metody povysheniya effektivnosti logicheskikh korrektorov // Mashinnoye obucheniye i analiz dannykh VDyukova Ye YuIZhuravlev PAProkof'yev T 1 11 2015 Analysis and synthesis of pattern recognition algorithms using variable-valued logic LALyutikova EVShmatova Information Technologies 22 4 2016