Intellectual Information System for Supporting Text Data Rephrasing Processes Based on Deep Learning Nickolay Rudnichenkoa, Vladimir Vychuzhanina, Natalia Shibaevaa, Svetlana Antoshchuka and Igor Petrovb a Odessa National Polytechnic University, Shevchenko Avenue, 1, Odessa, 65001, Ukraine b National University "Odessa Maritime Academy", Didrichson str., 8, Odessa, 65029, Ukraine Abstract This paper focuses on the specifics of the an intelligent information system development for supporting the processes of large volumes text data analysis and rephrasing based on deep learning models. An overview of the key problems in the field of content analysis on the Internet, in particular, text data, is carried out, ways of solving existing difficulties are indicated, the artificial neural networks models using necessity to automate the process of natural language processing is substantiated. Paper presents the results of LSTM and Transformer models advantages analysis, software implementation within the framework of an intelligent information system to support dialogue and text paraphrasing functions. System’s design diagrams in UML are presented, its functionality is described and an artificial neural network model’s effectiveness experimental study using the generated test data sets is provided. As a result the paraphrases generation quality metrics were assessed based on the use of the Loss and BLEU indicators. In the conclusion the authors describe ways for the further system’s development and possible researches in related areas of natural language processing based on deep learning. Keywords 1 Deep learning, artificial Intelligence, data analysis, text rephrasing, intellectual information systems, Big data, information systems design. 1. Introduction Currently, there is an active growth in the amount of data created and posted in the Internet space on various applied topics. This is largely due to the high demand from consumers for various goods and services, in useful content. In this regard, modern business is actively developing and uses technologies and methods for creating high-quality materials, including for marketing promotion of its products, attracting experts in the required fields to develop targeted and authoritative content, implementing existing and innovative technologies for generating the required information [1-4]. The largest volume of content circulating on the Internet belongs to multimedia materials, including video clips and photographs, located on various platforms, due to a wide range of reproduction and presentation of information in a graphic form, dynamism and interactivity of its display [5]. One of the most critical shortcomings of studying and analyzing this kind of content is the high time spent on its full viewing, as well as the need to use both visual and auditory perception, which is not always effective and justified. In this regard, the most accessible, valuable and convenient for IntelITSIS’2021: 2nd International Workshop on Intelligent Information Technologies and Systems of Information Security, March 24–26, 2021, Khmelnytskyi, Ukraine EMAIL: nickolay.rud@gmail.com (N. Rudnichenko); vint532@yandex.ua (V. Vychuzhanin); nati.sh@gmail.com (N. Shibaeva); asgonpu@gmail.com (S. Antoshchuk); firmn@list.ru (I. Petrov) ORCID: 0000-0002-7343-8076 (N. Rudnichenko); 0000-0002-6302-1832 (V. Vychuzhanin); 0000-0002-7869-9953 (N. Shibaeva); 0000- 0002-9346-145X (S. Antoshchuk); 0000-0002-8740-6198 (I. Petrov) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) analysis and study by a person (and economically less costly for a business) is historically textual content, nowadays often published on websites in the form of articles, essays or comments [6]. This content can be educational, reference, advertising and other in nature, but for its successful creation and use, it is required to ensure its clarity, logical coherence and uniqueness, which enhances the chances of increasing the position of the website on which the content is posted in the search engines used by given relevant words or phrases [7]. This, in turn, enhance the number of potential customers viewing content, helping to increase overall conversions and profit for the business. The laboriousness of creating such content is associated with the need to search and attract experts in the described applied field to writing it, which implies the time and material costs for a preliminary assessment of the performer's capabilities, the quality and consistency of his academic or journalistic style of the target audience, the subject of the article under consideration, the cost of his work, and other, difficult to formalize factors [8-10]. In this regard, tools, technologies, methods and systems for automating the processes of generating text content, its syntactic, punctuation and grammatical processing, assessing the level of uniqueness and other indicators of the quality of the materials obtained are becoming increasingly important. The most promising direction for solving such problems is the theory of artificial intelligence (AI), which includes such areas as data mining (DM), machine (ML) and deep learning (DL) based on the use of artificial neural networks (NN) [11-14]. 2. Description of problem Due to the development of AI concepts, a variety of application systems, tools and means are being created to provide the process of intelligent processing and analysis of textual information in natural language [15]. However, their efficiency for large amounts of text data is often low, including due the impossibility of providing functions for generating complete and coherent sentences or phrases for a given subject area, taking into account its specificity and syntactic rules of a particular language [16]. Full automation of the listed processes is possible only under the condition of multi- criteria perception of the text by the model at the level of human heuristics, which requires the creation of a full-fledged AI, which is still an unattainable task for modern science [17-20]. Nevertheless, the task of paraphrasing textual content is feasible due to the use of the DL approach, the effectiveness of which has significantly increased in recent years through the development of learning algorithms for artificial neural network models and their sets that underlie DL[21]. The first difficulty in using this approach to develop efficient natural language processing tools is the great computational complexity. The most efficient DL algorithms require several days, weeks and months of continuous operation of computational clusters using video cards sets for parallelizing data processing and training models [22]. The second problem is the lack of open, complete, non-contradictory, integral and labeled data for training DL models located and structured in one place. Due to the specifics of each specific task, data must be collected from various sources, which leads to their heterogeneity, the appearance of anomalies, omissions, and errors in syntax, forcing significant resources to be spent on their preprocessing and cleaning. To solve natural language processing problems (paraphrasing in particular) based on the use of NNs, various architectures are used, the most popular and effective in practice are recurrent NN models (in particular LSTMs) and networks of the Transformer type [23]. The essential advantages of these NN architectures for the problem under consideration are: • support for internal loops, through which the model builds a prediction of the result based on previous input data, which is a kind of memory; • the possibility of ranking data by importance with the subsequent removal of less significant and preserving more critical in the framework of long-term memory; • deleting and adding data to the state of a model element by using gate mechanisms (point multiplication operation) [24,25]. It should be noted that additional advantages of Transformer models over LSTMs, critical from the point of view of paraphrasing text content, are [26]: • implementation of the internal attention mechanism to identify and link relevant words in the text, which helps to speed up the learning process; • support for the parallelization of computing processes within the framework of building a distributed system; • no need to process text sequences in a strictly ordered form, which also allows us to speed up the construction of the model [27]. In this regard, having analyzed the possibilities of using the described NN architectures, it is expedient to compare and study the possibilities of using the LSTM and Transformer architectures within the framework of the problem we are solving. Due to the lack of full-fledged and functional ready-made software solutions in the open access and the purpose of the obtained results practical use it is necessary to develop our own applied software that implements the selected NN architecture to solve the considered problem. The purpose of this article is to develop and study an intelligent information system for supporting the analysis and paraphrasing processes of heterogeneous big text data based on the use of deep learning capabilities (NN models in particular) to improve the quality of the created paraphrases. 3. System development To study the possibilities of solving the problem under consideration, a decision was made in the development of an intelligent information system (IIS) in the form of a client-server software application based on the use of the Python 3.7 programming language and the Django framework to implement the server-side logic of the system, the PyCharm integrated development environment, Postgress database management system for storage user data, Anaconda distribution kit with Jupiter Notebook (conducting computational experiments to study the effectiveness of NN models), as well as software libraries sklearn (using functionality to create the necessary objects, validate and evaluate the numerical values of metrics), matplotlib (visualization of the obtained data dependencies), Pytorch (creation of NN models and their tuning, setting hyperparameter values and starting computational processes), Pandas and NumPy (processing input data arrays and structuring them for input to NN models). The IIS project is implemented based on the use of the UML language and is a chat application for communicating with real people (registered users of the system) and virtual interlocutors (chat bots) based on the created NN models, as well as for issuing tasks to chat bots to generate paraphrases. A diagram of the main use cases of the system is shown in Fig. 1. Figure 1: Diagram of system’s main use cases According to the developed use case diagram, the IIS user has next abilities: • registration in the system to create an account and further work in the application; • authorization in the system under an account (login and password) to enter user’s personal account in order to use its functionality; • exit and delete the created account; • search for users registered in the system for dialogue; • adding the found user account to the contact list; • sending files and messages to other users; • removing a user from the list of contacts, chat history; • choosing a chat bot from the list of available ones; • conducting communication with a virtual interlocutor; • sending a command to one or several chat bots to build a set of paraphrases; • selection and saving of the required variant of the paraphrased sentence from those issued; • assessment of each of the received paraphrases according to the criterion of acceptability (0 - incorrect paraphrase, 1 - correct). The general process of user interaction with the system consists of several stages.: 1. Going to the web page of the system through a browser at the URL with subsequent registration. The data entered by the user during registration are checked for compliance with the necessary criteria (complexity and length of the password, the uniqueness of the login) are saved in the database, and a message with a generated one-time link to activate the account is sent to the specified email address. 2. User authorization in the system by login and password. In this case, the data is validated on the client side (profile activity, correctness of the entered data). In case of data errors, a corresponding information message is issued. After successful authorization in the database using the developed control trigger, empty tables of lists of contacts, chats, messages and sent files of the user are created, which are filled in during the use of the system. 3. Selecting operations. After the user is authorized in the system, his profile is loaded, from which it is possible to go to the page for searching for interlocutors, managing personal data created by chats (opening or deleting an existing chat, creating a new chat), viewing the communication history, choosing a chat bot. 4. Chatting with a person. After selecting the desired user or existing chat, a corresponding request is made to the server side through the POST method, after processing which the necessary data is loaded from the database and displayed on the correspondence page, which consists of a text field, a message feed, an interface settings panel and text display formatting, a panel stylized graphic images, buttons for sending files and saving correspondence to an external file in txt format. 5. Interaction with the chat bot. When choosing the function of communicating with a chatbot, the user can specify an option from the available list of existing presets of NN models, on the basis of which the bots were created, view their brief description and characteristics, as well as their functional purpose (the possibilities of simple communication and generation of paraphrases are available). For a more convenient configuration in the mode of correspondence via a text line, a number of commands are available aimed at setting restrictions to the bot, set using the key “-cb - % key =% value”, where key is a control command, and value is a value for this command (maybe number or string). For example, the following options are available: limiting the number of words used in the construction of a paraphrase; entering stop words that are prohibited for use; selection of case and additional language; setting the number of generated paraphrases. The main components implemented for the implementation of the designated functional are summarized in Fig. 2. In this diagram, we can see that the system user accesses the main Chat App component, which is responsible for the relationship and interaction of the Chat, Chat Profile, Chat History components with each other. The Chat component is responsible for sending and receiving messages, Chat Profile implements the functionality of the chat state according to the user's profile, Chat History provides serialization, deserialization and storage of correspondence data. All components are linked to the data security component (cryptography dependency from the Python open repository) and the database (django.db.backends.postgresql dependency). Figure 2: Diagram of key system components The Neural Networks component provides the functionality for loading the created LSTM and Transformer models into the business logic of the chat bot's work for conducting correspondence and generating paraphrases of the Chat component. 4. Experiments and results analysis The process of collecting reliable and adequately labeled data for building NN models is a difficult task, therefore, for conducting this study, it is advisable to use existing data sets as a training data sample for NN used to solve the problem of generating paraphrases. Among the available datasets are Quora Question Dataset [28] and Para-NMT-50m [29]. The first of these datasets is designed to detect duplicate questions on Quora sites. It consists of a series of questions and an end mark that indicates whether they are meaningfully identical. In view of the limited nature of this set, its full use for the problem being solved is not optimal, since as a result of training, the NN model will be able to generate only interrogative sentences. However, it can be partially adapted for the task of maintaining a conversation with the user. Para-NMT-50m is more suitable for the task of generating paraphrases, because compiled with the help of automatic translation systems and includes 50,000,000 pairs of sentences, which are paraphrases of each other. The complexity of full-fledged processing of this data set lies in the high volume and laboriousness of performing calculations on available hardware. In this regard, by generalization and aggregation by means of the used libraries functional the sample size was reduced to 7,500,000 sentence pairs. They were divided as follows: 6,000,000 for the training sample; 1,000,000 for a test sample; 500,000 for the validation sample. To use the data correctly and submit them to the input of the NN LSTM and Transformer models, it was necessary to tokenize them (using the WordTokenizer class of the used library) and compose a data structure in the form of a dictionary in which a number is assigned to each token. After that, the procedure was carried out to translate sentences into a sequence of numbers. The performance of each architecture will be tested on a test dataset that was created previously using the BLEU metric. During the training of the models, 2 optimizers were used - Adam and SGD. Adam is based on using the mean of the second moments of the gradients, its advantages are efficiency for large amounts of data, invariance to gradient scaling, and ease of implementation. The SGD optimizer performs stochastic gradient descent, its ease of use is due to the ability to ensure convergence avoiding local minima and the absence of the need to simultaneously load all training data in the memory of the computing device [30]. Adam hyperparameters that changed during computational experiments: params (implements registration of model weights NN), lr (controls the learning rate of the model), betas (used to calculate the mean value of the gradient and its square), eps (excludes the possibility of division by 0), weight_decay (controls the procedure for reducing the model weights to ensure regularization), n_epochs (limiting the number of model training epochs). In the SGD optimizer, in addition to the hyperparameters indicated above, the following are also used: momentum (acceleration of processing gradient vectors), damping (control of the process of accelerating data processing in time). To compare the operation of the models, a number of computational experiments were carried out, 4 of which are shown in Table 1. Table 1 The values of the specified hyperparameters of the LSTM model and optimizers NN LSTM Optimizer № Name Value Name Value hidden_size 2*inp_sz optim SGD num_layers 1 lr 0.01 bias False momentum 0 1 batch_first True dampening 0.01 dropout 0 weight_decay 0 bidirectional False n_epochs 10 000 hidden_size 2*inp_sz optim SGD num_layers 1 lr 0.0001 bias True momentum 0.05 2 batch_first True dampening 0 dropout 0.3 weight_decay 0.01 bidirectional False n_epochs 10 000 hidden_size 3*inp_sz optim SGD num_layers 2 lr 0.001 bias False momentum 0.1 3 batch_first True dampening 0 dropout 0.4 weight_decay 0.05 bidirectional False n_epochs 10 000 hidden_size 3*inp_sz optim Adam num_layers 2 lr 0.001 bias True betas 0.9; 0.99 4 batch_first True eps 0.000001 dropout 0.35 weight_decay 0.1 bidirectional False n_epochs 20 000 To build NN models based on the LSTM architecture, the following hyperparameters were changed: input_size (specifies the number of features in the input data), hidden_size (the dimension of the hidden layer of neurons), num_layers (the total number of NN model layers), bias (accounting for deviation parameters), batch_first (places the size batch of data before measuring time), dropout (regulates the process of prohibiting weights on the next layer of the NN model), bidirectional (provides a choice of two-sided LSTM type, to account for sequences both before and after a word from the sample), vocab_sz (determines the size of the dictionary), emb_sz (sets the size of the embedding matrix). To build NN models based on the Transformer architecture, the following hyperparameters were changed: d_model (the number of parameters in the encoder and decoder), nhead (the value of the number of threads), num_encoder_layers and num_decoder_layers (the number of neuron layers in the encoder and decoder), dim_feedforward (the dimension of fully connected layers of the NN model), activation (relu or gelu activation function), src, tgt (sequences for using encoder and decoder models). The obtained dependences of the learning errors of LSTM models are shown in Fig.3. Figure 3: LSTM models learning errors dependences Based on the results of the first experiment, it can be answered that the LSTM model was trained for 10,000 epochs (the values on the abscissa were scaled 1 to 1000). The value of the loss function gradually dropped from 7.5 to 5.3 on the validation set, and it should be noted that, starting from the 5000th epoch, the indicators of the loss function sharply deteriorated, which may be due to the high value of the model learning rate. For the second experiment, it is characteristic that the value of the loss function gradually (more smoothly compared to the previous case) dropped from 7.7 to 4.5, which means a more appropriate selection of the model's hyperparameters. Analyzing the results of the third experiment, it should be noted that the model's performance deteriorated, since the value of the loss function decreased from a higher value and stopped at 5.3. A similar effect is caused by the fact that the NN model has become more complex and lost its effectiveness due to the increase in the number of hidden layers. The fourth experiment showed that with a significant increase in the number of training epochs (up to 20,000) and a change in the optimizer to Adam, the value of the loss function gradually dropped from 8.4 to 3.6 on the validation set, which is the smallest value, providing the highest accuracy of the obtained LSTM models. As shown by a number of other experiments, a further increase in the number of epochs and a decrease in the betas parameter allows the final value of the function to be reduced to 2.6, but the learning process becomes very time- consuming and laborious in terms of computational costs, and the time costs begin to increase exponentially. The values of the specified hyperparameters and the results of the experiments carried out on the use of the Transformer model are shown in Table 2 and Fig. 4, respectively. According to the experiments, the best loss function in terms of indicator is the Transformer model, the level of which was reduced to a minimum value of 0.6. Table 2 Values of Specified Transformer Model Hyperparameters and Optimizers NN Transformer Optimizer № Name Value Name Value d_model 512 optim Adam nhead 4 lr 0.0001 num_encoder_layers 6 betas 0.9; 0.99 1 num_decoder_layers 6 eps 0.000001 dim_feedforward 512 weight_decay 0.1 dropout 0.2 n_epochs 10 000 activation relu d_model 512 optim Adam nhead 8 lr 0.01 num_encoder_layers 12 betas 0.88; 0.95 2 num_decoder_layers 12 eps 0.000001 dim_feedforward 1024 weight_decay 0.05 dropout 0.3 n_epochs 10 000 activation gelu Figure 4: Transformer models training errors dependencies For a more holistic check paraphrases generation quality by models, it is necessary to use the BLEU metric, which in practice has some dependence on the loss function (the lower the value of the loss function, the higher the BLEU). The results of evaluating the BLEU metric for the created NN models are shown in Table 3. As we can see the Transformer model shows the best result in solving the problem of generating paraphrases. The BLEU metric shows how similar the generated sentence is to the one in the dataset as a reference. Table 3 BLEU metric values BLEU Experiment LSTM Transformer 1 12.23 29.81 2 21.56 46.73 3 14.02 4 28.39 Based on the practical use of the developed system in various scenarios for different texts, it should be noted that on the basis of Transformer models, on average, in 7 out of 10 cases, paraphrase is generated correctly and satisfies expectations, for LSTM models this value is 3-4 out of 10. It should be also noted that all generated sentences within the framework of the study are complete and complete, they have a meaning close to the initial sentences, which is a confirmation of the correctness of the aggregation of the taken dataset. 5. Conclusion The studies carried out made it possible to establish the adequacy of using DL for solving the generating paraphrases problems. The experimental results obtained allowed us to reveal a greater accuracy of the Transformer models, compared to LSTM (approximately 1.5-2 times). In addition to the described business tasks related to the promotion of goods or services on the Internet, the developed system, due to its modular structure, can be further supplemented with a number of functional capabilities and used to solve problems in various applied areas, in particular: • to increase the efficiency of the search for borrowings in texts, not only by the direct occurrence of given shingles (as is implemented in existing systems for checking uniqueness based on parsing search results for relevant phrases, word combinations and sentences), but also by possible variations in the order of placing words and taking them into account synonyms; • to automate the detection of duplicates in text data by indirect occurrences; • to automate the process of supplementing training data in machine translation systems, expanded by adding paraphrased sentences, datasets become more flexible, increasing the efficiency of DL trained models. Possible area of further research within the framework of the considered problem is the semantic analysis of text content entered by the user in the form of files or messages, according to various criteria for assessing the difficultly formalized characteristics of the text, including sentiment, emotional coloring or complexity of perception. 6. References [1] N. Rudnichenko, V. Vychuzhanin, N. Shybaieva, D. Shybaiev, Т. Otradskaya, I. Petrov, The use of machine learning methods to automate the classification of text data arrays large amounts. Information management systems and technologies. Problems and solutions. Ecology, Odessa, (2019) 31-46. [2] V.V. Vychuzhanin, N.D. Rudnichenko, Metody informatsionnykh tekhnologiy v diagnostike sostoyaniya slozhnykh tekhnicheskikh sistem. Monografiya, Odessa (2019). [3] D.S. Shybaiev, T.V. Otradskaya, M.V. Stepanchuk, N.O. Shybaieva, N.D. Rudnichenko, Predicting system for the estimated cost of real estate objects development using neural networks. ZhSTU Herald. Technical science 83, (2020) 154-160. [4] N. Rudnichenko, V. Vychuzhanin, I. Petrov, D. Shibaev, Decision Support System for the Machine Learning Methods Selection in Big Data Mining. Proceedings 0f The Third International Workshop on Computer Modeling and Intelligent Systems (CMIS-2020): session 6 “Intelligent Information Technologies (2020) 872-885. [5] N. Rudnichenko, S. Antoshchuk, V. Vychuzhanin, A. Ben, I. Petrov, Information System for the Intellectual Assessment Customers Text Reviews Tonality Based on Artificial Neural Networks. Proceedings of the 9th International Conference "Information Control Systems & Technologies" (2020) 371-385. [6] N. D. Rudnichenko, V.V. Vychuzhanin, N.O. Shibaeva, D.S. Shibaev, T.V. Otradskaya, I.M. Petrov, Software development for interactive big data mining based on machine learning methods. Actual problems of information systems and technologies: monograph (2020) 59-71. [7] U. Sivarajah, M. Mustafa Kamal, Z. Irani, V. Weerakkody, Critical analysis of Big Data challenges and analytical methods. Journal of Business Research 70 (2017) 263-286. [8] P. Bala, Introduction of Big Data With Analytics of Big Data. Advanced Deep Learning Applications in Big Data Analytics (2021) 110-125. [9] W. Lidong, A, Cheryl, Machine Learning in Big Data. International Journal of Mathematical, Engineering and Management Sciences 1 (2016) 52-61. doi:10.33889/IJMEMS.2016.1.2-006. [10] Q. Junfei, Wu. Qihui, D. Guoru, Xu. Yuhua, F. Shuo, A survey of machine learning for big data processing. EURASIP Journal on Advances in Signal Processing (2016). doi: 10.1186/s13634- 016-0355-x. [11] J. Won, K. Gun, L. Jiwon, L. Hyunwuk, H. Dongho, R. Won, Deep learning with GPUs. Advances in Computers (2021) doi: 10.1016/bs.adcom.2020.11.003. [12] H. Ryan, K. Panagiotis, P. Charalampos, T. Lazaros, L. Xing, L. George, A Prototype Deep Learning Paraphrase Identification Service for Discovering Information Cascades in Social Networks. IEEE International Conference on Multimedia and ExpoAt (2020). doi: 10.1109/ICMEW46912.2020.9106044. [13] S. Will, Neural Networks in Big Data and Web Search. Neural networks in web search, 4 7 (2018) doi:10.3390/data4010007. [14] L. Zichao, J. Xin, S. Lifeng, L. Hang, Paraphrase Generation with Deep Reinforcement Learning. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018) 3865-3878. doi:10.18653/v1/D18-1421. [15] B. O. Bliznyuk, L.V. Vasilieva, I.D. Strelnikov, D.S. Tkachuk, Modern methods of natural language processing. Visnik of Kharkiv National University of the Name of V.N. Karazin (2017). [16] W. Qiang, L. Bei, X. Tong, Z. Jingbo, L. Changliang, W. Derek, C. Lidia, Learning Deep Transformer Models for Machine Translation (2019) 1810-1822. doi:10.18653/v1/P19-1176. [17] V.S. Yuskov, I. V. Barannikova, Comparative Analysis of Natural Language Processing Platforms. Mining information and analytical bulletin 3 (2017) 272-278. [18] H. Hassan, C. Beneki, S. Unger, M. Mazinani, M. Yeganegi, Text Mining in Big Data Analytics. Big Data and Cognitive Computing 4 1 (2020). doi:10.3390/bdcc4010001. [19] S. Zhou, Research on the Application of Deep Learning in Text Generation. Journal of Physics: Conference Series 1693 (2020). doi:10.1088/1742-6596/1693/1/012060. [20] A.O. Soshnikov, The role of the functional-semantic type of speech in the process of automatic processing of natural language. Eurasian Union of Scientists 15 (2015) 134-135. [21] N. I. Widiastuti, Deep Learning – Now and Next in Text Mining and Natural Language Processing. IOP Conference Series: Materials Science and Engineering, 407 (2018). doi:10.1088/1757-899X/407/1/012114. [22] H. Li, Deep learning for natural language processing: Advantages and challenges. National Science Review 5 (2018). doi:10.1093/nsr/nwx110. [23] C. Hou, C. Zhou, K. Zhou, J. Sun, S. Xuanyuanj, A Survey of Deep Learning Applied to Story Generation (2019). doi:10.1007/978-3-030-34139-8_1. [24] T. Young, D. Hazarika, S. Poria, E. Cambria, Recent Trends in Deep Learning Based Natural Language Processing. IEEE Computational Intelligence Magazine 13 (2018) 55-75. [25] A.Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). NY, USA (2017). [26] X. Zhang, M. Chen, Y. Qin, NLP-QA Framework Based on LSTM-RNN (2018) 307-311. doi:10.1109/ICDSBA.2018.00065. 6000–6010. [27] K. Adam, K. Smagulova, A. James, Memristive LSTM Architectures (2020). doi:10.1007/978-3- 030-14524-8_12. [28] Quora Question Dataset 2020, URL: https://www.kaggle.com/c/quora-question-pairs. [29] ParaNMT-50M 2020, URL: https://www.aclweb.org/anthology/P18-1042/. [30] P.D.Kingma, J.Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).