1. Introduction

COLINS-

Method for Recognizing Linguistic Constructions Based on Stochastic Neural Networks

Eugene Fedorov

Olga Nechyporenko

0 0 Cherkasy State Technological University , Shevchenko blvd., 460, Cherkasy, 18006 , Ukraine

2022

6 12 13

The paper proposes a method for recognizing linguistic constructions based on stochastic neural networks. The novelty of the study lies in the fact that in order to ensure the interaction of software agents representing subjects that operate within supply chains, two models of an artificial neural network were created to recognize natural language structures based on the restricted Boltzmann machine (in contrast to it, the neurons of the hidden layer were interconnected), a criterion for evaluating the effectiveness of training the proposed models was chosen, the parameters of the proposed models were identified based on the contrastive divergence. The proposed models and methods for their parametric identification make it possible to improve the recognition accuracy of natural language constructions. The proposed method for recognizing linguistic structures based on stochastic neural networks can be used in various intelligent systems that use the recognition of natural language structures.

1 supply chain multi-agent interaction artificial neural network restricted Boltzmann machine contrastive divergence linguistic constructions

1. Introduction

• difficulty in forming a representative sample; • high probability for the training and adaptation method hitting a local extremum; • inaccessibility to human understanding of the knowledge accumulated by the network (it is impossible to represent the relationship between input and output in the form of rules), since they are distributed among all elements of the neural network and are presented in the form of its weight coefficients.

The following recurrent networks are most often used as neural networks for recognition: • Elman neural network (ENN) or simple recurrent network (SRN) [15, 16], which is a recurrent two-layer network and is based on a multilayer perceptron (MLP). The advantages of this network are a simpler architecture and higher learning rate than in gated and bidirectional networks. The disadvantage is the insufficient recognition accuracy compared to bidirectional networks; • bidirectional recurrent neural network (BRNN) [17, 18], which is a recurrent two-layer network and is built on the basis of two Elman neural networks. The advantage of this network is a higher recognition accuracy than in a conventional Elman neural network. The disadvantages are a higher complexity of determining the architecture, a lower learning rate than in a conventional Elman neural network; • long short-term memory (LSTM) [19, 20], which is a recurrent network and is built on the basis of memory blocks (containing one or more cells) and input, output forgetting gates (FIR filters). The advantage of this network is a higher recognition accuracy than in a conventional Elman neural network. The disadvantages are a higher complexity of determining the architecture, a lower learning rate than in a conventional Elman neural network; • bidirectional recurrent neural network (BLSTM) [21, 22], which is a recurrent network and is built on the basis of two LSTM neural networks. The advantage of this network is a higher recognition accuracy than in a conventional LSTM. The disadvantages are a higher architecture definition complexity, a lower learning rate than conventional LSTM; • gated recurrent unit (GRU) [23, 24], which is a recurrent two-layer network and is built on the basis of hidden blocks and reset and update gates (FIR filters). The advantage of this network is a higher recognition accuracy than in a conventional Elman neural network. The disadvantages are a higher complexity of determining the architecture, a lower learning rate than in a conventional Elman neural network; • idirectional recurrent neural network (BGRU) [25], which is a recurrent network and is built on the basis of two GRU neural networks. The advantage of this network is a higher recognition accuracy than in a conventional GRU. The disadvantages are a higher complexity of architecture definition, a lower learning rate than in conventional GRU.

Thus, none of the networks satisfies all the criteria.

The aim of the work is to develop a method for recognizing natural language constructions. To achieve the goal, the following tasks were set and solved: • analyze existing recognition methods; • propose neural network recognition models; • choose a criterion for evaluating the effectiveness of neural network recognition models; • propose methods for determining the values of the parameters of neural network recognition models; • conduct numerical study. 2. Block diagram of neural network recognition models 3. Neural network recognition models 3.1. Recognition model based on a unidirectional RBMRHL

Positive phase (steps 1-2) 1.

Computation of the state of hidden neurons

1  N in N out N h  1 + exp − bhj − ∑ wiijn−h xiin − ∑ wiojut −h xiout − ∑ wihj −h xih   i=1 i=1 i=1 

1, x hj =  0,  N h  1 + exp − bout − ∑ wout −h xh   j ij i   i=1  1, xout =  j 0,

The result is vector (x1out ,..., xNouotut ) .

3.2.

Recognition model based on a bidirectional RBMRHL 1. 2.

xin = x1in , xout = 0 Computation of the state of hidden neurons 1 N out 1 N out 1 P =

j P = j  

N in

N in 1 + exp − bhj − ∑ wLiinj−h xiin − ∑ wLoijut −h xiout − ∑ wLhij−h xLhi   i=1 i=1 i=1  1 + exp − bhj − ∑ wRiijn−h xiin − ∑ wRiojut −h xiout − ∑ wRihj −h  i=1 i=1 i=1 N h

N h 1, xRh =  j 0,

Computation of the state of visible output neurons

The result is vector (x1out ,..., xNouotut ) .

models 4. Criteria for evaluating the effectiveness of neural network recognition

In this work, for training a unidirectional and bidirectional RBMRHL model, the model adequacy criterion was chosen, which means choosing such parameter values W = {wiijn−h , wiojut −h , w (matching the model output and the desired output): } respectively, that deliver maximum accuracy F = 1 The training of the RBMRHL model is subject to the criterion ( 1 ). ( 1 ) 5. Methods for determining the values of parameters of neural network recognition models 5.1. Principles for determining the parameters of neural network recognition models

RBMRHL parameter values are determined based on the CD-1 contrastive divergence method, which speeds up supervised learning, since instead of stabilizing the state of neurons, it performs only one step of tuning their state. RBMRHL classification operates in two phases - positive and negative.

For RBMRHL recognition in the positive phase, visible input and output neurons are fixed, and RBMRHL functions until hidden neurons are established. In the negative phase, firstly hidden neurons trained in the positive phase are fixed, and RBMRHL functions until visible input and output neurons are established, after which visible input and output neurons trained in the negative phase are fixed, and RBMRHL functions until hidden neurons are established. 5.2. Method for determining the parameter values of a unidirectional RBMRHL model for recognition based on contrastive divergence 1.

Number of training iteration n = 1 , initialization by means of uniform distribution on the interval ( 0,1 ) or [-0.5, 0.5] offsets (thresholds) biin (n) , i ∈1, N in , biout (n) , i ∈1, N out , b hj (n) , j ∈1, N h , and weights win−h (n) , i ∈1, N in , j ∈1, N h , wout−h (n) , i ∈1, N out , j ∈1, N h , ij ij wihj−h (n) , i ∈1, N h , j ∈1, N h , win−h (n) = 0 , wout−h (n) = 0 , wihj−h (n) = 0 , win−h (n) = wijni −h (n) , ii ii ij wout−h (n) = wojiut−h (n) , wihj−h (n) = w hji−h (n) .

Training set{(xµin , xµout ) | xµin ∈{0,1}Nin , xµout ∈{0,1}N out } , µ ∈1, P is specified, where xµin – µ th training vector of states of visible input neurons, xout – µ th training vector of states of visible µ output neurons, P is the power of the training set.

Positive phase (steps 3-7) 2. 3.

x1µin = xµin , x1µout = xµout , µ ∈1, P . µ =1, x1µh −1 = 0 , xin = x1µin , xout = x1µout , xh = x1µh −1 .

Computation of the state of hidden neurons

1 Pj =  N in  − bhj (n) − ∑ win−h (n)xiin −  i=1 ij 1 + exp N out    

N h   ∑ wout −h (n)xiout − ∑ wihj −h (n)xih 

ij  i=1 i=1  x1µh = xh . If µ < P , then µ = µ +1 , go to 5.

1, Pj ≥ U ( 0,1 ) x hj =  0, Pj < U ( 0,1 ) 10. Computation of the state of visible output neurons 12. x2µin = xin , x2µout = xout . If µ < P , then µ = µ +1, go to 9. 13. µ =1, x2µh−1 = 0 , 14. xin = x2µin , xout = x2µout , xh = x2µh−1. 15. Computation of the state of hidden neurons 11. Computation of the state of visible input neurons 16. x2µh = xh . If µ < P , then µ = µ +1, go to 14. 17. Adjustment of synaptic weights based on Boltzmann's rule   i=1 −bhj(n) − ∑win−h(n)xiin −  ij 1+ exp i=1 Nout

 ∑wout−h(n)xiout − ∑wihj−h(n)xih  ij

  Nh  1+ exp−bijn(n) − ∑win−h(n)xih   ij  i=1  1 1 Nin wout−h(n +1) = wiojut−h(n) +η(ρi+j −ρi−j) , i∈1,Nout , j∈1,Nh , ij ρi+j = 1 P 1 P

∑x1µouit x1µhj , ρi−j = ∑x2µouit x2µhj , P µ=1 P µ=1

h wihj−h(n +1) = wihj−h(n) +η(ρi+j −ρi−j) , i ∈1, N , j∈1,Nh ,

1, Pj ≥U( 0,1 ) xhj =  0, Pj <U( 0,1 ) i=1 , j∈1,Nh . biout(n +1) = biout(n) +η 1 P ∑x2µouit , i∈1,Nout ,

 ∑x1µouit − 1 P bhj(n +1) = bhj(n) +η 1 ∑P x1µhj − ∑x2µhj  , j∈1,Nh ,

1 P  win−h(n +1) = wiijn−h(n) +η(ρi+j −ρi−j), i∈1,Nin , j∈1,Nh , ij ρi+j = 1 P 1 P

∑x1µini x1µhj , ρi−j = ∑x2µini x2µhj , ∑ x1µh −1,i x1µhj , ρ i−j = ∑ ∑| x1µouit − x2µouit | > ε , then n = n + 1, go to 2.

Method for determining parameter values of a bidirectional RBMRHL model for recognition based on contrastive divergence

Number of training iteration n = 1 , initialization by means of uniform distribution on the interval ( 0,1 ) or [-0.5, 0.5] offsets (thresholds) biin (n) , i ∈1, N in , b out (n) , i ∈1, N out , i bLhj (n) , j ∈1, N h , bR hj (n) , j ∈1, N h , and weights wLiinj−h (n) , i ∈1, N in , j ∈1, N h , wRiijn−h (n) , i ∈1, N in , j ∈1, N h , wLoijut−h (n) , i ∈1, N out , j ∈1, N h , wRiojut−h (n) , i ∈1, N out , j ∈1, N h , wL wLoijut−h (n) = wLojuit−h (n) , wRiojut−h (n) = wR ojiut−h (n) .

Training set {(xµin , xµout ) | xµin ∈{0,1}N in , xµout ∈{0,1}N out } , µ ∈1, P , is specified, where xµin – µ th training vector of states of visible input neurons, xout – µ th training vector of states of visible µ output neurons, P is the power of the training set.

Positive phase (steps 3-11)

x1µin = xµin , x1µout = xµout , µ ∈1, P .

h µ = 1, xL1µ −1 = 0 , xin = x1µin , xout = x1µout , xLh = xL1µh −1 Computation of the state of hidden neurons

P = j 1 + exp   − bL j (n) − ∑ wL

h 

N in i=1  N h  i=1  ∑ wLhij−h (n)xLhi xL1µh = xL . If µ < P , then µ = µ + 1 , go to 5.

h µ = P , xR1µh +1 = 0 .

xin = x1µin , xout = x1µout , xRh = xR1µh +1 . 10. Computation of the state of hidden neurons 1

N out i=1 in−h (n)xiin − ∑ wLout−h (n)xiout −  ij ij       14. Computation of the state of visible output neurons 1 1 1 , j∈1,Nh ,  Nin Nout  −bRhj(n) − ∑wRiijn−h(n)xiin − ∑wRiojut−h(n)xiout − 1+ exp i=1 i=1 

Pj = Pj = 15. Computation of the state of visible output neurons 16. x2µin = xin , x2µout = xout . If µ < P , then µ = µ +1, go to 13. 17. µ =1, xL2µh−1 = 0 , 18. xin = x2µin , xout = x2µout , xLh = xL2µh−1 19. Computation of the state of hidden neurons  Nh  i=1 ∑wLhij−h(n)xLhi  Nin Nout  −bLhj(n) − ∑wLiinj−h(n)xiin − ∑wLoijut−h(n)xiout − 1+ exp i=1 i=1 

1, Pj ≥U( 0,1 ) xLhj =  0, Pj <U( 0,1 ) 23. Computation of the state of hidden neurons       bRhj(n +1) = bRhj(n)+η 1 P

1 P  ∑xR1µhj − ∑xR2µhj , j∈1,Nh , biin(n +1) = biin(n) +η 1 P

1 P  ∑x1µini − ∑x2µini  , i∈1,Nin , biout(n +1) = biout(n) +η 1 P

 ∑x1µouit − 1 P

∑x2µouit , i∈1,Nout , bLhj(n +1) = bLhj(n)+η 1 P

1 P  ∑xL1µhj − ∑xL2µhj  , j∈1,Nh , 24. xR2µh = xRh . If µ >1, then µ = µ −1, go to 22. 25. Adjustment of synaptic weights based on Boltzmann's rule 26. If P⋅ Nout

P Nout µ=1 i=1 ∑∑| x1µouit − x2µouit | >ε , then n = n +1, go to 2.

P µ=1 P µ=1 P µ=1    P µ=1 P µ=1 P µ=1 P µ=1 P µ=1

P µ=1 P µ=1 P µ=1 P µ=1

P µ=1 P µ=1

P µ=1 wLhij−h(n +1) = wLhij−h(n)+η(ρi+j −ρi−j), i∈1,Nh , j∈1,Nh , ρi+j = 1 P 1 P

∑xL1µh−1,i xL1µhj , ρi−j = ∑xL2µh−1,i xL2µhj wRihj−h(n +1) = wRihj−h(n) +η(ρi+j −ρi−j) , i∈1,Nh , j∈1,Nh , ρi+j = 1 P 1 P

∑xR1µh+1,i xR1µh,j , ρi−j = ∑xR2µh+1,i xR2µhj wLiinj−h(n +1) = wLiinj−h(n) +η(ρi+j −ρi−j), i∈1,Nin , j∈1,Nh , ρi+j = 1 P 1 P

∑x1µini xL1µhj , ρi−j = ∑x2µini xL2µhj , wRiijn−h(n +1) = wiijn−h(n) +η(ρi+j −ρi−j), i∈1,Nin , j∈1,Nh , ρi+j = 1 P 1 P

∑x1µini xR1µhj , ρi−j = ∑x2µini xR2µhj , wLoijut−h(n +1) = wLoijut−h(n) +η(ρi+j −ρi−j), i∈1,Nout , j∈1,Nh , ρi+j = 1 P 1 P

∑x1µouit xL1µhj , ρi−j = ∑x2µouit xL2µhj wRiojut−h(n +1) = wiojut−h(n) +η(ρi+j −ρi−j), i∈1,Nout , j∈1,Nh , ρi+j = 1 P 1 P

∑x1µouit xR1µhj , ρi−j = ∑x2µouit xR2µhj

6. Numerical study

The numerical study of the proposed methods for determining the parameter values was carried out in the Google Colaboratory environment using the Tensorflow package.

To determine the structure of a classification model based on RBMRHL with 200 input neurons (corresponding to the number of analyzed words in each text), i.e. determining the number of hidden neurons, a number of experiments were carried out, the results of which are presented in Figure 3.

1 0,9 0,8 0,7 0,6 y c rau0,5 c c A0,4 0,3 0,2 0,1 0 20 40 60 80 100 120 160 180 200 220 240

260

The standard IMDB data set was used as the input data to determine the values of the parameters of the neural network classification model. The criterion for choosing the structure of the neural network model was the classification accuracy. As can be seen from the Figure 3, with an increase in the number of hidden neurons, the accuracy value increases. For prediction, it is sufficient to use 200 hidden neurons, since with a further increase in the number of hidden neurons, the change in the accuracy value is insignificant.

Table 1 presents a comparative description of neural networks for recognition, where BRBMRHL means bidirectional RBMRHL. According to Table 1, BRBMRHL has the highest recognition accuracy.

Network Criterion Accuracy

7. Conclusions 1. To solve the problem of increasing the accuracy of recognition of natural language structures, the existing methods of neural network classification were investigated. These studies have shown that today the most effective is the use of recurrent neural networks. 2. To improve the quality of recognition of natural language constructions, mathematical models of unidirectional and bidirectional stochastic neural networks RBMRHL (Restricted Boltzmann machine with recurrent hidden layer) were created, in which, unlike the traditional RBM (Restricted Boltzmann machine), hidden layer neurons are interconnected. 3. For models of unidirectional and bidirectional stochastic neural networks RBMRHL, methods for identifying their parameters based on contrastive divergence were proposed. 4. In the course of a numerical study of models of unidirectional and bidirectional stochastic neural networks RBMRHL, their structure was determined. The experiments showed that with 200 hidden neurons (corresponding to the number of input neurons), the accuracy value does not change significantly, and the selected network gives recognition results with maximum accuracy. 5. The proposed approach can be used in various intelligent systems that use the recognition of natural language constructs. For example, in supply chain management systems, where natural language interaction between subjects, which are represented by software agents, plays an important role. 8. References [16] T. Mikolov, M. Karafi´at, L. Burget, J. Cernock`y, S. Khudanpur, Recurrent neural network based language model, in: 11th Annual Conference of the International Speech Communication Association, 2010, pp. 1045-1048. [17] M. Sundermeyer, T. Alkhouli, J. Wuebker, H. Ney, Translation modeling with bidirectional recurrent neural networks, in: Proceedings of the Conference on Empirical Methods on Natural Language Processing, 2014, pp. 14-25. [18] M. Berglund, T. Raiko, M. Honkala, L. Kärkkäinen, A. Vetek, J. Karhunen, Bidirectional recurrent neural networks as generative models, in: Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015, pp. 856−864. [19] M. Sundermeyer, R. Schluter, H. Ney, LSTM neural networks for language modeling, in: Thirteenth Annual Conference of the International Speech Communication Association, 2012, pp. 194-197. doi: 10.1.1.248.4448. [20] P. Potash, A. Romanov, A. Rumshisky, Ghostwriter: using an LSTM for automatic rap lyric generation, in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 1919– 1924. doi:10.18653/v1/D15-1221. [21] E. Kiperwasser, Y. Goldberg, Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations, Transactions of the Association for Computational Linguistics 4 (2016) 313–327. doi: 10.1162/tacl_a_00101. [22] A. Graves, J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Networks 18 (2005) 602–610, doi:10.1016/j.neunet.2005.06.042. [23] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, arXiv preprint arXiv:1412.3555, 2014. [24] R. Dey, F. M. Salem, Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks, arXiv:1701.05923, 2017. URL: https://arxiv.org/ftp/arxiv/papers/1701/1701.05923.pdf. [25] S. A. Khan, S. M. D. Khalid, M. A. Shahzad, F. Shafait, Table structure extraction with bidirectional gated recurrent unit networks, in: Proceedings of the 15th International Conference on Document Analysis and Recognition, vol. 4, no. 2, 2019, pp. 78–88. doi:10.1109/ICDAR.2019.00220 [26] G. E. Hinton, A Practical Guide to Training Restricted Boltzmann Machines, Technical Report

UTML TR 2010–003, University of Toronto, 2010. [27] A. Fischer, C. Igel, Training Restricted Boltzmann Machines: An Introduction, Pattern Recognition 47 (2014) 25-39. doi: 10.1016/j.patcog.2013.05.025.

[1]

P. F.

Dominey ,

Hoen ,

Inui , A neurolinguistic model of grammatical construction processing , in: Journal of Cognitive Neuroscience , vol. 18 , issue 12, 2006 , pp. 2088 - 2107 . doi: 10 .1162/jocn. 2006 . 18 .12. 2088 .

[2]

Khairova ,

Sharonova , Modeling a logical network of relations of semantic items in super phrasal unities , in: Proceedings of the 2011 9th East-West Design & Test Symposium (EWDTS) , 2011 , pp. 360 - 365 . doi: 10 .1109/EWDTS. 2011 . 6116585 .

[3]

J. F.

Cox ,

J.G.

Schleher , Theory of Constraints Handbook, New York, NY, McGraw-Hill , 2010 .

[4]

E. M.

Goldratt , My saga to improve production, Selected Readings in Constraints Management, Falls Church, VA: APICS ( 1996 ) 43 - 48 .

[5]

E. M.

Goldratt , Production: The TOC Way (Revised Edition) including CD-ROM Simulator and Workbook, Revised edition , Great Barrington, MA: North River Press, 2003 .

[6]

S. N.

Sivanandam ,

Sumathi ,

S. N.

Deepa , Introduction to Neural Networks using Matlab 6 .0, The

McGraw-Hill

Comp ., Inc., New Delhi, 2006 .

[7]

Haykin , Neural networks and Learning Machines , Upper Saddle River, New Jersey: Pearson Education, Inc., 2009 .

[8] K.-L. Du , K. M. S. Swamy , Neural Networks and Statistical Learning , Springer-Verlag, London, 2014 .

[9]

Fedorov , T. Utkina, О. Nechyporenko, Forecast method for natural language constructions based on a modified gated recursive block , in: CEUR Workshop Proceedings , vol. 2604 , 2020 , pp. 199 - 214 .

[10]

Zhang ,

Tang ,

Vairappan , A novel learning method for Elman neural network using local search , in: Neural Information Processing - Letters and Reviews , vol. 11 , 2007 , pp. 181 - 188 .

[11]

Dey ,

F. M.

Salem , Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks , arXiv:1701.05923 , 2017 . - URL: https://arxiv.org/ftp/arxiv/papers/1701/1701.05923.pdf.

[12]

Cho , B. van Merrienboer ,

Gulcehre ,

Bougares ,

Schwenk ,

Bengio , Learning phrase representations using RNN encoder-decoder for statistical machine translation , in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , Doha, Qatar, 2014 , pp. 1724 - 1734 . doi: 10 .3115/v1/ D14 -1179.

[13]

Jaeger ,

Maass , J. Prıncipe, Special issue on echo state networks and liquid state machines , Neural Networks 20 ( 2007 ) 287 - 289 . doi: 10 .1016/j.neunet. 2007 . 04 .001.

[14]

A. H. S.

Hamdany ,

R. R. O.

Al-Nima ,

L. H.

Albak , Translating cuneiform symbols using artificial neural network , in: TELKOMNIKA Telecommunication, Computing, Electronics and Control , volume 19 , No. 2 , 2021 , pp. 438 - 443 . doi: 10 .12928/telkomnika.v19i2. 16134

[15]

Wysocki and

Ławryńczuk , Predictive control of a multivariable neutralisation process using Elman neural networks , in: Advances in Intelligent Systems and Computing , Springer:Heidelberg, 2015 , pp. 335 - 344 . doi: 10 .1007/978-3- 319 -15796-234.