<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Using a logical Derivative to Determine the Information Content of Object Properties in Speech Recognition Tasks</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Larisa</forename><surname>Lyutikova</surname></persName>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Institute of Applied Mathematics and Automation KBSC RAS</orgName>
								<orgName type="department" key="dep2">Shortanova. 89A</orgName>
								<orgName type="institution">KBR</orgName>
								<address>
									<postCode>360000</postCode>
									<settlement>Nalchik</settlement>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Institute for Computer Science and Problems of Regional Management KBSC RAS</orgName>
								<orgName type="institution">KBR</orgName>
								<address>
									<addrLine>Armand</addrLine>
									<postCode>360000, 37A</postCode>
									<settlement>Nalchik</settlement>
									<region>st. I</region>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Using a logical Derivative to Determine the Information Content of Object Properties in Speech Recognition Tasks</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">300BD3F003D947C871BB679273457ED6</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T05:32+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Decisive function, Boolean derivative, analysis of the data</term>
					<term>the algorithm --neuron</term>
					<term>decision trees</term>
					<term>corrective surgery</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper offers an approach for determining the information content of object properties in recognition tasks. The scope of this approach is not the subject area where objects and characteristics of these objects are specified, but a trained neural network that works correctly on a given subject area. In this paper, we propose a method for constructing a decision function based on the weight characteristics of a correctly functioning neuron. A logical derivative is used to evaluate the significance of object characteristics. This makes it possible to track how the decision function will change its value if one or more object characteristics change their value. This will allow us to draw a conclusion about the most important properties of the subject area under consideration.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The research area of this work is related to such scientific direction as pattern recognition, the purpose of which is to classify objects into several categories or classes. The practicality of this method is obvious, since after classification, working with a certain class of information requires fewer resources than working with its full volume. In practice, solving problems related to pattern recognition is a complex theoretical and practical task. This is primarily because each specific case has its own specifics, which does not allow creating a universal algorithm for working with information.</p><p>Today, neural networks are one of the most popular tools for solving problems for which there is no acceptable mathematical model or exact algorithm, but there are many so-called heuristics, based on which they try to get more or less accurate patterns inherent in the studied subject area <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref>.</p><p>The data used to find patterns are not perfect, they are usually incomplete, have a lot of inaccuracies and distortions.</p><p>Despite the fact that neural networks do a good job with a great variety of such tasks, the rules of their decision-making are not clear to the user. Only the structure and weight characteristics that the neural network acquired because of training are available.</p><p>Identifying logical connections based on the characteristics of a correctly functioning neural network is an important task, since neural networks are built on heuristics, and their solutions can be ambiguous. It also takes many cycles to build object models, which entails long time costs, and training can lead to a dead end. The problem of retraining is acute.</p><p>To solve these problems, corrective methods are important, since they provide an opportunity to gain new knowledge about the patterns in the subject area under study. This will further help you better understand the nature of the data being studied, and therefore develop algorithms that are more expressive To date, the main methods of pattern recognition have been identified. There are four main methods: the method of comparison with the standard, the statistical method, the method using artificial neural networks (ins), and the structural method <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b3">4]</ref>. However, these methods have a number of obvious disadvantages. The method of comparison with the standard is accompanied by distortions of the samples under consideration, which requires taking into account a considerable number of cases of small deviations from the standard. Statistical methods face a number of difficulties in solving practical problems, since samples from each class are often not representative enough, and probability distribution density functions are difficult to construct. Neural networks are characterized by long-term training based on the consideration of many examples. Structural methods are sensitive to distortions of recognized images and require a complex procedure for constructing a set of features.</p><p>This situation leads to the need to combine these methods in order to compensate for the shortcomings of some approaches with the advantages of others, which is actively used in solving practical problems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">The purpose and objectives of the study</head><p>This paper offers a combined approach to solving problems of pattern recognition, which makes it possible to correct the operation of the neural network. In cases of an already trained neural network, even if the subject area is not known, based only on the weight characteristics of the neuron, get logical connections in the data under study, build a logical solver function for a given area, and thus go to structural methods of solving. Use the Boolean derivative to identify the most important characteristics for each object and for the subject area as a whole. This means getting a more correct picture of the area under study, and therefore the ability to work with it For example, if we are talking about pattern recognition, this approach makes it possible to build a logical classifier function that can correct the operation of this network using only the weights of the neural network. Further investigation of this function will reveal the most important features for each object. You can also show how changing the values of certain attributes results in changes in objects.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Research methods</head><p>Such authors describe methods of teaching Sigma-PI neuron as A.V. Timofeev, Z. M. Shibzukhov. A.V. Timofeev proposed training the network in one pass.</p><p>The construction of the classifier function will be based on the study of the weight values of the -neural network.</p><p>As a solver (a function or method that divides the subject area into classes), we will consider the neuron (Sigma-PI neuron), which is a more expressive generalization of classical neural networks, since it corresponds more formally to natural neurons. It can be represented by the following linear function of the input signals EXAMPLE. Suppose the area for learning a Sigma PI neuron is represented by the following set of features and objects:</p><formula xml:id="formula_0">Table 1. Example. 1 x 2 x 3 x y 0 0 1 а (2) 0 1 1 в(4) 1 1 0 с(6)</formula><p>Attributes are characterized by corresponding codes and are represented by a set with values:</p><p>x = {𝑥 1 = (0,0,1), 𝑥 1 = (0,1,1), 𝑥 1 = (1,1,0)} The object is {а, 𝑏, 𝑐} which are encoded accord accordingly 𝑎 − 2, 𝑏 − 4, с − 6. Encoding objects with numerical values is necessary for fast training of this neural network. After training, which is performed in one pass, the  -neuron will have the form: 𝑠𝑝(𝑥 1 𝑥 2 𝑥 3 ) = 2𝑥 3 + 2𝑥 2 𝑥 3 + 4𝑥 2 𝑥 1 Any query (𝑥 1 , 𝑥 2 , 𝑥 3 ), hat is represented in the table will be identified with its corresponding object.. If the query does not match the values of the variables that are in the training sample, for example (0,1,0), the result may not be correct or it may not be at all. 𝑠𝑝(0,1,0) = 2 * 0 + 2 * 1 * 0 + 4 * 0 * 0 = 0. The network doesn't recognize any elements. Because an object with exactly the same characteristics is not in the data. However, it could be an object, b4, or C-6, if there are inaccuracies, noise, or interference in the data.</p><p>Therefore, to obtain a possible class of solutions, the trained neuron requires additional corrective approaches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Construction of a crucial logical function based on the structure of a neuron and identification of significant features</head><p>When constructing a decision function, you don't need to know the training sample.it is enough to know the value of the weights and the structure of the  -neuron. The function is constructed using a tree whose construction algorithm is described in <ref type="bibr" target="#b0">[1]</ref>.</p><p>The number of levels is equal to the largest number of products of variables in each of the terms +1. In the example, there will be 3 of them.</p><p>Arrange the variables 12 { , ,..., } n x x x to the lower levels of the tree. The second layer is the coefficients before the terms with one variable; the third is the coefficients before the terms with two variables, and so on. The value of each node will be considered as EXAMPLE. Let us build a tree for defining the main properties of objects.</p><formula xml:id="formula_1">𝑠𝑝(𝑥 1 𝑥 2 𝑥 3 ) = 2𝑥 3 + 2𝑥 2 𝑥 3 + 4𝑥 2 𝑥 1</formula><p>Because of building a tree from the training sample, you can identify the basic rules for the relationship between objects and their characteristics (see figure <ref type="figure" target="#fig_0">1</ref>). However, these rules are not enough if the value of the variable zero is also informative for making decisions. In addition, these rules are not enough in the case of multi-value encoding.</p><p>Therefore, there is a need to build additional trees, or imaginary paths in the figure this is indicated with a dash-dotted line.</p><p>For example, it looks like in figure <ref type="figure">2</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 2: Object part of a logical function</head><p>If an item doesn't have any attribute, the path to it will be marked with a dashed line.</p><p>Then the decision function for our example will look like this: 𝐹(𝑥 1 𝑥 2 𝑥 3 ) = 𝑃(6)𝑥 1 𝑥̅ 3 ∨ 𝑃( <ref type="formula">6</ref>)𝑃( <ref type="formula">4</ref>)𝑥 2 ∨ 𝑃( <ref type="formula">4</ref>)𝑃( <ref type="formula">2</ref>)𝑥 3 𝑥̅ 1 To identify significant variables in this function, i.e. the most important features for the source data, we will use the logical derivative.</p><p>Logical differentiation in this case implies differentiation of Boolean functions, which in some sense is analogous to classical differentiation <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref>. In the semantic meaning, the Boolean derivative can show the degree of dependence of a function on a given variable, and indicate how justified the expectations of changing the values of the function are in cases of changing the value of the variable.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 1.The Boolean derivative</head><formula xml:id="formula_2">i f x  </formula><p>from a Boolean function 𝑓(𝑥 1 , … , 𝑥 𝑛 ) by the variable xi we will call the sum modulo 2 of the corresponding residual functions:</p><formula xml:id="formula_3">1 1 1 1<label>1 1</label></formula><p>( ,..., ,0, ,..., ) ( ,..., ,1, ,..., )</p><formula xml:id="formula_4">i i n i i n i f f x x x x f x x x x x        Definition 2.</formula><p>The weight of the derivative of a Boolean function ()</p><formula xml:id="formula_5">i f Р x  </formula><p>the number ("1") in the column of values of the derivative is called Statement 1. The weight of the derivative for a given variable shows how much the function 𝑓(𝑥 1 , . . . , 𝑥 𝑛 ) depends on the variable xi in comparison with other variables. Definition 3. Expression of the form:</p><formula xml:id="formula_6">1 1 1 1 () ( ... ) ... kk k k k ff x x x x x           ;</formula><p>it is called the mixed derivative of the k-th order of the corresponding variables. However, the order of the fixed variable does not matter.</p><p>A mixed Boolean derivative with respect to k-variables indicates the conditions under which a function changes its value while simultaneously changing the values of 𝑥 1 , . . . , 𝑥 𝑘 .</p><p>If it is necessary to determine the most important characteristics of the objects under consideration in a given subject area after the logical function solver is constructed, you can consider the Boolean derivatives for each variable and thus select the most significant properties.</p><p>For  <ref type="formula">6</ref>)𝑃( <ref type="formula">4</ref>)𝑥 1 𝑥 3 , This result gives conflicting data about only two objects, and makes it impossible to classify them. Therefore, it can be argued that the variable 𝑥 2 reflects the most important properties for the data under study.</p><p>And the variables 𝑥 3 and 𝑥 1 are dependent, i.e. they are ensemble variables 𝑥 1 = 𝑥 3 ̅̅̅. Because of the analysis, we can say that knowing the weight values of the neural network, we can build a logical function that reflects the patterns in the data. These patterns may not be obvious when using conventional domain analysis methods. The study of the obtained logical function can reveal the most significant features for each object, and for the considered data as a whole.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Comparisons with well-known attention methods in neural networks</head><p>The successfully implemented Attention technology is a way to tell the network what to pay more attention to, that is, to report the probability of a particular outcome depending on the state of neurons and incoming data. The Attention layer implemented in Keras itself identifies factors based on the training sample that reduce network error. Identification of important factors is performed through the method of error back propagation, just as it is done for convolutional networks. When we train a network on data, the importance becomes a function of the probability of an outcome depending on the data received by the network.</p><p>In this method, it is proposed to identify logical connections between the objects of the training sample, without the training sample itself, which was used to form the analyzed neural network. In contrast to the Attention method, the proposed approach identifies logical rather than statistical patterns in the data. Allows you to avoid repeating the error back propagation procedure many times. Knowledge of logical patterns, as opposed to statistical ones, will allow you to formalize and more accurately understand the nature of the analyzed data. This approach is relevant in cases when we do not have a training sample with large amounts of data (rare events that are difficult to reproduce), or when processing large amounts of data is faced with the necessary capacity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>A large number of methods are used to create systems capable of obtaining knowledge from data. In this paper, we consider an approach that allows, knowing only the value of the neural network weights, to find patterns in the data and build a logical function that reflects these patterns.</p><p>The proposed analysis of the sensitivity of this function by the methods of the logical derivative allows one to formalize the process of finding the coefficients of importance for the characteristics of the properties of objects. This is important when the data is incomplete, fuzzy, or distorted due to information noise.</p><p>All this leads to the development of methods for more accurate solution of intellectual problems. This work was supported by the RFBR grant No. 19-01-00648-a</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>1 (</head><label>1</label><figDesc>coefficients, which becomes a neuron in the learning process.If the subject is a  -neuron, it recognizes k elements from the specified subject area 12 { , ,..., } k Y y y y  generated by the corresponding set of attributes 1 { ,..., } k XX [1].</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>where are the indexes of the corresponding objects whose variables are included as factors in the element on 1 .</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Decision Tree for a given subject area</figDesc><graphic coords="4,220.18,72.00,154.65,157.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="1,0.00,191.15,594.96,459.74" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>The resulting derivative can classify objects by the variable 𝑥 3 .</figDesc><table><row><cell cols="2">our example:</cell></row><row><cell>𝑓(Х) 𝑑х 1</cell><cell>= 𝑃(6)𝑥̅ 3 ∨ 𝑃(6)𝑃(4)𝑥 2 ⊕ 𝑃(6)𝑃(4)𝑥 2 ∨ 𝑃(4)𝑃(2)𝑥 3 = 𝑃(4)𝑃(2)𝑥 3 ∨ 𝑃(6)𝑥̅ 3</cell></row><row><cell cols="2">Derivative of the variable 𝑥 2 𝑓(Х) 𝑑х 3 = 𝑃(6)𝑥̅ 3 𝑥 1 ∨ 𝑃(2)𝑃(4)𝑥̅ 1 𝑥 3 ⊕ 𝑃(2)𝑃(4)𝑥̅ 1 𝑥 3 ∨ 𝑃(6)𝑥̅ 3 𝑥 1 ∨ 𝑃(4)𝑃(6) =</cell></row><row><cell></cell><cell>𝑃(6)𝑃(4)𝑥̅ 3 𝑥 1 ̅̅̅ ∨ 𝑃(</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Sigma-Pi Neural Networks: Error Correction Methods</title>
		<author>
			<persName><forename type="first">L</forename><surname>Lyutikova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">published in Procedia Computer Science</title>
		<imprint>
			<biblScope unit="volume">145</biblScope>
			<biblScope unit="page" from="312" to="318" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Hybrid computing using a neural network with dynamic external memory</title>
		<author>
			<persName><forename type="first">Alex</forename><surname>Graves</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Greg</forename><surname>Wayne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Malcolm</forename><surname>Reynolds</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tim</forename><surname>Harley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ivo</forename><surname>Danihelka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Agnieszka</forename><surname>Grabskabarwinska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sergio</forename><surname>Gómez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Edward</forename><surname>Grefenstette</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tiago</forename><surname>Ramalho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">John</forename><surname>Agapiou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nature</title>
		<imprint>
			<biblScope unit="page" from="471" to="476" />
			<date type="published" when="2016">2016. 7626</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Balzer Stacked generalization: an introduction to super learning</title>
		<author>
			<persName><forename type="first">Ashley</forename><forename type="middle">I</forename><surname>Naimi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Laura</forename><forename type="middle">B</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">European Journal of Epidemiology</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="459" to="464" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Cohen Differentiable Learning of Logical Rules for Knowledge Base Reasoning // Advances in Neural Information Processing Systems</title>
		<author>
			<persName><forename type="first">Fan</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhilin</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Yang</forename><surname>William</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2017-12">2017. December, 2017</date>
			<biblScope unit="page" from="2320" to="2329" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Machine Learning: The Art and Science of Algorithms that Make Sense of Data</title>
		<author>
			<persName><forename type="first">Peter</forename><surname>Flach</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2012">2012</date>
			<publisher>Cambridge University Press</publisher>
			<biblScope unit="page">396</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Ensemble Classifiers and Their Applications: A Review</title>
		<author>
			<persName><forename type="first">Rahman</forename><surname>Akhlaqur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tasnim</forename><surname>Sumaira</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Computer Trends and Technology</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="31" to="35" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Metody povysheniya effektivnosti logicheskikh korrektorov // Mashinnoye obucheniye i analiz dannykh</title>
		<author>
			<persName><forename type="middle">V</forename><surname>Dyukova Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yu</forename><forename type="middle">I</forename><surname>Zhuravlev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">A</forename><surname>Prokof'yev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">T</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="1555" to="1583" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Analysis and synthesis of pattern recognition algorithms using variable-valued logic</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">A</forename><surname>Lyutikova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">V</forename><surname>Shmatova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Technologies</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="292" to="297" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
