<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>COLINS-</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Linguistic Constructions Translation Method Based on Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eugene Fedorov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olga Nechyporenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cherkasy State Technological University</institution>
          ,
          <addr-line>Shevchenko blvd., 460, Cherkasy, 18006</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>7</volume>
      <fpage>20</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>The paper proposes a linguistic constructions translation method based on recurrent neural networks. The novelty of the study lies in the fact that to ensure the interaction of software agents representing subjects within supply chains, four artificial neural network models for the translation of the linguistic structures were created, a criterion for evaluating the training effectiveness of the proposed models was selected, and the parameters of the proposed models were identified based on the Adam method. In the created models, unlike the existing translational neural networks, the decoder does not have feedback from the output layer to the hidden layer. The developed models and methods for their parametric identification make it possible to improve the accuracy of translation of natural language constructions. The created natural language constructions translation method based on neural networks can be used in various intelligent computer systems that use the translation of linguistic constructions.</p>
      </abstract>
      <kwd-group>
        <kwd>1 supply chain</kwd>
        <kwd>multi-agent interaction</kwd>
        <kwd>artificial neural network</kwd>
        <kwd>translations of linguistic constructions</kwd>
        <kwd>Adam method</kwd>
        <kwd>linguistic constructions</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Literature review</title>
      <p>– Agent Building Shell. Coordinating the actions of agents representing the firm and supply chain
actors interacting with it (for example, suppliers and customers) using the COOL coordination
language;</p>
      <p>– MetaMorph. Coordination of the actions of agents representing the firm, supply chain actors
interacting with it and intermediaries, using the actions of intermediaries;</p>
      <p>– NetMan. Management of agents representing business units of firms and interacting within the
same firm and between firms through agreements;
– BPMAT &amp;SCL. BPMAT models firm activity, SCL models inter-firm flows;
– MASCOT. Coordinating the actions of agents representing the company and the subjects of the
supply chain interacting with it. Used to improve supply chain flexibility through planning and
scheduling. Coordinates production across multiple sites and evaluates new products and strategic
business decisions (such as production, purchase, or supplier selection) based on capacity and material
needs across the entire supply chain);</p>
      <p>– DASCh. Management of agents representing the firm, product flows, and information flows to
investigate gaps in these flows;</p>
      <p>– Task dependency network. Management of agents representing the firm and supply chain entities
interacting with it using the auction protocol (selection of agents through an auction);
– MASC. Management of agents representing the firm and supply chain entities interacting with it
using the auction protocol (selection of agents through an auction);</p>
      <p>– OCEAN. Management of agents representing the firm and the subjects of the supply chain
interacting with it through negotiations. Uses competition at the local level and cooperation at the
global level.</p>
      <p>Specified multi-agent systems do not provide computer simulations of the interaction of supply
chain entities based on linguistic constructs and soft computing.</p>
      <p>Today, artificial intelligence methods are used to translate linguistic constructions, while the most
popular is the connectionist approach [6-8], which for many networks allows the use of parallel
learning methods [9-11].</p>
      <p>The following recurrent networks are most often used as neural networks for translation:
• Elman neural network (ENN or SRN) [12, 13], the simplest of recurrent neural networks;
• bidirectional recurrent neural network (BRNN) [14, 15], which is built based on two Elman
neural networks;
• long short-term memory (LSTM) [16, 17];
• bidirectional recurrent neural network (BLSTM) [18, 19], which is built based on two LSTM
neural networks;
• gated recurrent unit (GRU) [20, 21];
• bidirectional recurrent neural network (BGRU) [22], which is built based on two GRU neural
networks.</p>
      <p>The advantages of neural networks are [20, 23]:
• the possibility of their training and adaptation;
• the ability to identify patterns in the data, their generalization, i.e., extraction of knowledge
from data, therefore knowledge about the object (for example, its mathematical model) is not
required;
• parallel processing of information, which increases computing power.</p>
      <p>The disadvantages of neural networks are [24, 25]:
• the high probability of the learning and adaptation method hitting a local extremum;
• the difficulty of determining the structure of the network, since there are no algorithms for
calculating the number of layers and neurons in each layer for specific applications;
• inaccessibility for human understanding of the knowledge accumulated by the network (it is
impossible to represent the relationship between input and output in the form of rules), since they
are distributed among all elements of the neural network and are presented in the form of its
weight coefficients;
• the difficulty of forming a representative sample.</p>
      <p>Thus, none of the networks satisfies all the criteria.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed methodology</title>
    </sec>
    <sec id="sec-4">
      <title>3.1. Modified neural network Seq2seq</title>
      <p>
        is N (0) . It is assumed that an encoded output sequence y of length P will be converted to a text
output sequence, and the length of each encoded word of the output sequence is equal to N (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) . The
number of hidden layers in the encoder and decoder is equal to N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) and N (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) respectively.
      </p>
      <p>
        Functioning of the ANN
1. Initialization
2. Calculation of the output signal for the hidden layer of the encoder
n = 1, m = 1, µ = 1,
ym(2−)1,i = 0 , i ∈1,N (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) .
      </p>
      <p>
        yn(i0) = xµi , i ∈1,N (0) ,
ym(2j) = f (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) (s m(2j) ) , j ∈1,N (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) ,
w(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )y(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
      </p>
      <p>
        ij m−1,i −N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) ,
      </p>
    </sec>
    <sec id="sec-5">
      <title>3.2 Modified additive attention neural network</title>
      <p>
        is N (0) . It is assumed that an encoded output sequence y of length P will be converted to a text
output sequence, and the length of each encoded word of the output sequence is equal to N (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) . The
number of hidden layers in the encoder and decoder is equal to N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) and N (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) respectively.
      </p>
      <p>Functioning of the ANN
1. Initialization
n = 1, m = 1, µ = 1,
4.2. Context calculation</p>
      <p>
        P
cmi = ∑a mn yn(1i) , i ∈1,N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) .
      </p>
      <p>
        n =1
Calculation of the output signal for the hidden and output layers of the decoder
ym(2j) = f (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) (s m(2j) ) , j ∈1, N (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) ,
      </p>
      <p>
        N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
s m(2j) = b j(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) + ∑wi(j2)cmi +
i =1
      </p>
    </sec>
    <sec id="sec-6">
      <title>3.3 Modified multiplicative attention neural network</title>
      <p>
        is N (0) . It is assumed that an encoded output sequence y of length P will be converted to a text
output sequence, and the length of each encoded word of the output sequence is equal to N (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) . The
number of hidden layers in the encoder and decoder is equal to N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) and N (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) respectively.
2. Calculation of the output signal for the hidden layer of the encoder with the concatenation of
all outputs
      </p>
      <p>
        N (0) N (0) +N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
sn(1j) = b j(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) + ∑=1 i wi(j1)yn(i0) + i =N ∑(0) +1wi(j1)yn(1−)1,i −N (0) .
      </p>
      <p>Checking the completion of encoding. If n &lt; P , then µ = µ + 1 , n = n + 1, go to 2.
Calculation of the output signal for the hidden layer of the decoder
5.2. Context calculation</p>
      <p>
        P
cmi = ∑a mn yn(1i) , i ∈1,N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) .
      </p>
      <p>
        n =1
Calculation of the output signal for the hidden and output layers of the decoder
ym(2j) = tanhbj(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) + N∑(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )wi(j2)cmi + N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) ∑+N (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )wi(j2)ym(2,i)−N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )  , j ∈1,N (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) ,
 i =1 i =N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) +1 
      </p>
    </sec>
    <sec id="sec-7">
      <title>3.4 An additive attention neural network with pointing</title>
      <p>In this paper, we propose an additive attention neural network with an indication for translation
(similar to Figure 2), which is a recurrent network. The author's additive attention neural network with
pointing includes an encoder, an attention mechanism, a pointing mechanism, and a decoder (unlike
the classical additive attention neural network [30], the decoder does not have feedback from the
output layer to the hidden one and the pointing mechanism is added). The attention mechanism allows
the decoder to focus attention on specific encoder outputs. The pointing mechanism allows the
decoder to focus on specific encoder inputs.</p>
      <p>Let us limit our consideration to the functioning of the ANN and the case when the encoder and
decoder are based on ENN.</p>
      <p>
        It is assumed that a text input sequence of length P has already been converted to an encoded input
sequence x (for example, by word2vec), and the length of each encoded word of the input sequence
is N (0) . It is assumed that an encoded output sequence y of length P will be converted to a text
output sequence, and the length of each encoded word of the output sequence is equal to N (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) . The
number of hidden layers in the encoder and decoder is equal to N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) and N (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) respectively.
      </p>
      <p>Functioning of the ANN
1. Initialization
n = 1 , m = 1, µ = 1,</p>
      <p>Calculation of the output signal for the hidden layer of the encoder</p>
      <p>
        N (0) N (0) +N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
sn(1j) = b j(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) + ∑=1 i wi(j1)yn(i0) + i =N ∑(0) +1wi(j1)yn(1−)1,i −N (0) .
3. Checking the completion of encryption. If n &lt; P , then µ = µ + 1 , n = n +1, go to 2.
4. Additive attention. Concatenative (additive) attention is used, which connects the encoder's
hidden layer and the decryptor's hidden layer.
4.1. Calculation of weights (estimates) of attention
4.2. Context calculation
      </p>
      <p>Pointing</p>
      <p>
        N (0) N (0) +N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) N (0) +N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) +N (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
s m(3j) = b j(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) + ∑=1 i wi(j3) yn(i0) + ∑=1 i wi(j3)cm,i −N (0) + i =N ∑(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) +1wi(j3) y m(2−)1,i −N (0) −N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) .
      </p>
      <p>
        Calculation of the output signal for the hidden and output layers of the decoder
y m(2j ) = f (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) (s m(2j ) ) , j ∈1, N (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) ,
      </p>
      <p>
        N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
s m(2j ) = b j(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) + ∑ wi(j2)cmi +
i =1
3.5 Criteria for evaluating the effectiveness of neural network translation
models
      </p>
      <p>In this work, for the training of neural networks translation models, the criterion of model
adequacy was chosen, which means the choice of such values of parameters W that delivers
maximum accuracy (the coincidence of the model output and the desired output):</p>
      <p>F =
1 P 1,
P µ∑=1[d µ = yµ ] → mWax , [d µ = yµ ] = 0, d µ ≠ yµ
d µ = yµ .</p>
      <p>
        (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
      </p>
      <sec id="sec-7-1">
        <title>Let the weight vector be defined as</title>
        <p>where L – maximum number of layers.</p>
        <p>Let the ANN energy error be defined as</p>
        <p>1</p>
        <p>E (n ) =</p>
        <p>
          Training of neural network translation models is subject to criterion (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ).
3.6 Method for determining the values of parameters of neural network
translation models based on the Adam method
w(n ) = (w1(
          <xref ref-type="bibr" rid="ref11">11</xref>
          ) (n ),...,wN(L()L−1)N (L) (n ))T = (w1(n ),...,wNw (n ))T ,
        </p>
        <p>∑e 2j (n ) , e j (n ) = d j (n ) − y j (n ) ,
2 j
where y j (n ) – output of the jth neuron of the output layer,
d j (n ) – training output of the jth neuron of the output layer.</p>
        <p>Let the vector of partial derivatives (gradient) be defined as</p>
        <p> ∂E (n )
g(n ) = 
 ∂w1(n )

,..., ∂ENw(n(n) )  .</p>
        <p>∂w</p>
        <p>T</p>
      </sec>
      <sec id="sec-7-2">
        <title>Step 1. Initialization.</title>
        <p>Step 1.1. The initial vector of the weights w(0) is set.</p>
        <p>Step 1.2. The initial vector of the first moments m (−1) = 0 is set.</p>
        <p>Step 1.3. The initial vector of the second moments v(−1) = 0 is set.</p>
        <p>Step 1.4. Parameter η is set, which determines the learning rate (typically η = 0.001 ), decay rates
of the first and second moments are set as β1 and β2 respectively, β1,β2 ∈[0,1) (typically β1 = 0.9
and β2 = 0.999 ), and a stability parameter of ε is set to prevent division by zero (typically
ε = 10−8 ).</p>
        <p>Step 1.5. An initial gradient of g(0) is calculated.</p>
        <p>Step 1.6. n=0.</p>
        <p>Step 2. The vector of the first moments is calculated based on the exponential moving average
m (n ) = β m (n − 1) + (1 − β1)g(n ) .</p>
        <p>1
Step 3. The second moment vector is calculated based on the exponential moving average
v(n ) = β2v(n − 1) + (1 − β2 )g2 (n ) .</p>
        <p>Step 4. The vector of weights is calculated (the moments are corrected due to their initialization by
zero and the training step is scaled):
• traditional variant
•</p>
        <p>variant AMSGrad</p>
        <p>Step 5. Gradient g(n + 1) is calculated.
4. Experiments and results

m (n ) = m (n ) /(1 − βn +1 ), v(n ) = v(n ) /(1 − βn +1 ),
1  2</p>
        <p>ηm (n )
w(n + 1) = w(n ) −  ;</p>
        <p>v(n ) + ε
  
v(n ) = max{v(n − 1), v(n )} , where v(−1) = 0 ,
w(n + 1) = w(n ) −
ηm (n )

v(n ) + ε
.</p>
        <p>The numerical study of the proposed methods for determining the parameter values was carried out
in the Google Colaboratory environment using the Tensorflow package.</p>
        <p>To determine the structure of the Seq2seq modified neural network model, additive attention,
multiplicative attention, additive attention with pointing with 256 input neurons, i.e., determining the
number of hidden neurons, several experiments were carried out, the results of which are presented in
Figure 4.</p>
        <p>The standard data set spa-eng (English-Spanish dictionary) was used as input data to determine the
values of the parameters of the neural network translation model from
http://www.manythings.org/anki. The data set contained 20000 records (18000 for training and 2000
for validation). The criterion for choosing the structure of the neural network model was translation
accuracy.
0,8
0,6
y
c
rau0,5
c
c
A0,4
0,3
0,2
0,1
0
20
40
60
80
100
120
160
180
200
220
240</p>
        <p>260</p>
        <p>As can be seen from Figure 4, with an increase in the number of hidden neurons, the accuracy
value increases. For translation, it is sufficient to use 256 hidden neurons (corresponding to the
number of input neurons), since with a further increase in the number of hidden neurons, the change
in the accuracy value is insignificant. Similar studies were carried out on the standard datasets fra-eng
(English-French dictionary) and ita-eng (English-Italian dictionary) and similar results were obtained.</p>
        <p>Table 1 presents a comparative description of neural networks for translation.
5. Conclusions
1. To solve the problem of improving the accuracy of the translation of linguistic structures, the
existing methods of neural network translation were investigated. These studies have shown that
today the most effective is the use of recurrent neural networks.
2. To improve the quality of the translation of linguistic structures, mathematical models of
modified Seq2seq neural networks, additive attention, multiplicative attention, and additive
attention with pointing were created. Unlike the corresponding traditional neural networks, in the
decoder of the proposed modified neural networks, there are no feedbacks from the output layer to
the hidden one, i.e., the decoder's structure coincides with the encoder's structure, which simplifies
the practical implementation of the decoder.
3. In the course of a numerical study of neural network translation models, their structure was
determined. The experiments performed showed that with 256 hidden neurons (corresponding to
the number of neurons for encoding one word), the accuracy value does not change significantly,
and the selected network gives translation results with maximum accuracy.
4. The proposed approach can be used in various intelligent systems that use the translation of
linguistic structures. For example, in computer systems for supply chain management, where
natural language interaction between subjects, which are represented by computer agents, plays an
important role.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>6. References</title>
      <p>[15] M. Sundermeyer, T. Alkhouli, J. Wuebker, H. Ney, Translation modeling with bidirectional
recurrent neural networks, in: Proceedings of the Conference on Empirical Methods on Natural
Language Processing, 2014, pp. 14-25.
[16] P. Potash, A. Romanov, A. Rumshisky, Ghostwriter: using an LSTM for automatic rap lyric
generation, in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language
Processing, 2015, pp. 1919– 1924. doi:10.18653/v1/D15-1221.
[17] B. Cheng, X. Xu, Y. Zeng, J. Ren, S. Jung, Pedestrian trajectory prediction via the Social-Grid
LSTM model, in: The 2nd Asian Conference on Artificial Intelligence Technology, volume
2018, no. 16, 2018, pp. 1468–1474. doi: 10.1049/joe.2018.8316.
[18] R. Jin, Z. Chen, K. Wu, M. Wu, X. Li, R. Yan, Bi-LSTM-based two-stream network for machine
remaining useful life prediction, IEEE Transactions on Instrumentation and Measurement,
3167778 (2022). doi: 10.1109/TIM.2022.3167778
[19] E. Kiperwasser, Y. Goldberg, Simple and Accurate Dependency Parsing Using Bidirectional
LSTM Feature Representations, Transactions of the Association for Computational Linguistics 4
(2016) 313–327. doi: 10.1162/tacl_a_00101.
[20] R. Dey, F. M. Salem, Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks,
arXiv:1701.05923, 2017. – URL: https://arxiv.org/ftp/arxiv/papers/1701/1701.05923.pdf.
[21] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural
networks on sequence modeling, arXiv preprint arXiv:1412.3555, 2014.
[22] S. A. Khan, S. M. D. Khalid, M. A. Shahzad, F. Shafait, Table structure extraction with
bidirectional gated recurrent unit networks, in: International Conference on Document Analysis
and Recognition (ICDAR), volume 4, No. 2, 2019, pp. 78–88. doi: 10.1109/ICDAR.2019.00220.
[23] M. Artetxe, G. Labaka, E. Agirre, Unsupervised Statistical Machine Translation, in: Proceedings
of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 3632–
3642. doi: 10.18653/v1/D18-1399.
[24] A. H. S. Hamdany, R. R. O. Al-Nima, L. H. Albak, Translating cuneiform symbols using
artificial neural network, in: TELKOMNIKA Telecommunication, Computing, Electronics and
Control, volume 19, No. 2, 2021, pp. 438-443. doi: 10.12928/telkomnika.v19i2.16134.
[25] M. Artetxe, G. Labaka, E. Agirre, An Effective Approach to Unsupervised Machine Translation,
in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,
2019, pp. 194-203. doi: 10.18653/v1/P19-1019.
[26] K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase
representations using RNN encoder-decoder for statistical machine translation, in: Proceedings of
the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha,
Qatar, 2014, pp. 1724–1734. doi: 10.3115/v1/D14-1179.
[27] I. Sutskever, O. Vinyals, Q. V. Le. Sutskever I. Sequence to Sequence Learning with Neural
Networks, in: Proceedings of the 27th International Conference on Neural Information
Processing Systems (NIPS'14), Montreal, Canada, volume 2, 2014, pp. 3104–3112.
[28] D. Bahdanau, K. Cho, Yo. Bengio, Neural Machine Translation by Jointly Learning to Align and
Translate, in: International Conference on Learning Representations, 2015, pp.1-15. doi:
10.48550/arXiv.1409.0473
[29] M.-Th. Luong, H. Pham, Ch. D. Manning, Effective Approaches to Attention-based Neural
Machine Translation, in: Proceedings of the Conference on Empirical Methods in Natural
Language Processing, 2015, pp. 1412–1421.
[30] A. See, P. J. Liu, , C. D. Manning, Get to the point: Summarization with pointer-generator
networks. arXiv:1704.04368v2, 2017. doi:10.48550/arXiv.1704.04368.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Aharoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          ,
          <article-title>Towards string-to-tree neural machine translation</article-title>
          ,
          <source>in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics</source>
          , volume
          <volume>2</volume>
          :
          <string-name>
            <given-names>Short</given-names>
            <surname>Papers</surname>
          </string-name>
          ,
          <year>2017</year>
          , pp.
          <fpage>132</fpage>
          -
          <lpage>140</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P17</fpage>
          -2021.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Akoury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Krishna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Iyyer</surname>
          </string-name>
          ,
          <article-title>Syntactically supervised transformers for faster neural machine translation</article-title>
          ,
          <source>in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1269</fpage>
          -
          <lpage>1281</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P19</fpage>
          -1122.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ghadimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ghassemi Toosi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Heavey</surname>
          </string-name>
          ,
          <article-title>A multi-agent systems approach for sustainable supplier selection and order allocation in a partnership supply chain</article-title>
          ,
          <source>European Journal of Operational Research</source>
          ,
          <volume>269</volume>
          (
          <year>2018</year>
          )
          <fpage>286</fpage>
          -
          <lpage>301</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.ejor.
          <year>2017</year>
          .
          <volume>07</volume>
          .014.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ghadimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Heavey</surname>
          </string-name>
          ,
          <article-title>Intelligent sustainable supplier selection using multiagent technology: Theory and application for Industry 4.0 supply chains</article-title>
          ,
          <source>Computers &amp; Industrial Engineering</source>
          ,
          <volume>127</volume>
          (
          <year>2019</year>
          )
          <fpage>588</fpage>
          -
          <lpage>600</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.cie.
          <year>2018</year>
          .
          <volume>10</volume>
          .050.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Swierczek</surname>
          </string-name>
          ,
          <article-title>Decentralization of information and supply chain self-organization: the resulting effect on network performance in the transitive service triads</article-title>
          ,
          <source>Supply Chain Management</source>
          ,
          <volume>28</volume>
          (
          <year>2022</year>
          )
          <fpage>425</fpage>
          -
          <lpage>449</lpage>
          . doi:
          <volume>10</volume>
          .1108/scm-05-2021-0266.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Belinkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sajjad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Durrani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dalvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Glass</surname>
          </string-name>
          ,
          <article-title>Identifying and controlling important neurons in neural machine translation</article-title>
          ,
          <source>in: International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          . doi:
          <volume>10</volume>
          .48550/arXiv.
          <year>1811</year>
          .
          <volume>01157</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L. H.</given-names>
            <surname>Baniata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Park</surname>
          </string-name>
          , S.-B.,
          <string-name>
            <surname>Park</surname>
          </string-name>
          ,
          <article-title>A multitask-based neural machine translation model with part-of-speech tags integration for Arabic dialects</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>2502</volume>
          (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .3390/app8122502.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>K.-L. Du</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. S. Swamy</surname>
          </string-name>
          ,
          <source>Neural Networks and Statistical Learning</source>
          , Springer-Verlag, London,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E.</given-names>
            <surname>Fedorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lukashenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patrushev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lukashenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rudakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mitsenko</surname>
          </string-name>
          ,
          <article-title>The method of intelligent image processing based on a three-channel purely convolutional neural network</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>2255</volume>
          ,
          <year>2018</year>
          , pp.
          <fpage>336</fpage>
          -
          <lpage>351</lpage>
          . doi:
          <volume>10</volume>
          .1109/EWDTS.
          <year>2013</year>
          .
          <volume>6673185</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G. G.</given-names>
            <surname>Shvachych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. V.</given-names>
            <surname>Ivaschenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. V.</given-names>
            <surname>Busygin</surname>
          </string-name>
          , Ye. Ye.
          <article-title>Fedorov, Parallel computational algorithms in thermal processes in metallurgy and mining</article-title>
          ,
          <source>Naukovyi Visnyk Natsionalnoho Hirnychoho Universytetu</source>
          ,
          <volume>4</volume>
          (
          <year>2018</year>
          )
          <fpage>129</fpage>
          -
          <lpage>137</lpage>
          . doi:
          <volume>10</volume>
          .29202/nvngu/2018-4/19.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Shlomchak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Shvachych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Moroz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fedorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kozenkov</surname>
          </string-name>
          ,
          <source>Automated control of temperature regimes of alloyed steel products based on multiprocessors computing systems, Metalurgija</source>
          ,
          <volume>58</volume>
          (
          <year>2019</year>
          )
          <fpage>299</fpage>
          -
          <lpage>302</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Jozefowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zaremba</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Sutskever,</surname>
          </string-name>
          <article-title>An Empirical Exploration of Recurrent Network Architectures</article-title>
          ,
          <source>in: Proceedings of the 32nd International Conference on MachineLearning</source>
          , volume
          <volume>37</volume>
          ,
          <year>2015</year>
          , pp.
          <fpage>2342</fpage>
          -
          <lpage>2350</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wysocki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ławryńczuk</surname>
          </string-name>
          ,
          <article-title>Predictive control of a multivariable neutralisation process using Elman neural networks</article-title>
          ,
          <source>in: Advances in Intelligent Systems and Computing</source>
          , Springer: Heidelberg,
          <year>2015</year>
          , pp.
          <fpage>335</fpage>
          -
          <lpage>344</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -15796-234.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Berglund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Raiko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Honkala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kärkkäinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vetek</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Karhunen,</surname>
          </string-name>
          <article-title>Bidirectional recurrent neural networks as generative models</article-title>
          ,
          <source>in: Proceedings of the 28th International Conference on Neural Information Processing Systems</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>856</fpage>
          −
          <lpage>864</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>