Text Tonality Classification Using a Hybrid Convolutional Neural Network with Parallel and Sequential Connections Between Layers Roman Peleshchak1, Vasyl Lytvyn1, Ivan Peleshchak1, Andriy Khudyy1, Zoriana Rybchak1, Solomiya Mushasta1 1 Lviv Polytechnic National University, 12 Stepana Bandera Street, Lviv, 79000, Ukraine Abstract The analysis of the tonality of texts is an urgent problem in the field of natural language processing, which is often solved with the help of convolutional neural networks. However, most of these CNN models focus only on the study of local functions, ignoring global features. In this paper, a hybrid convolutional neural network with parallel-sequential connections between layers and from the layer of maximum pulling obtained from the matrix of the original text is proposed for the analysis of text tonality. The proposed hybrid convolutional neural network extracts text features using a parallel-connected convolutional block. Then the neural network classifies the features and combines these features with the original text features. The model of the proposed neural network is able to study both local and global features of short texts and has less convergence time and computing resource compared to the parallel DenseNet. Hybrid convolutional neural network with parallel- sequential connections between layers has a higher efficiency of text tone classification in 6 different databases compared to the base models CNN, TextCNN, FastText, DPCNN. Keywords1 Text tonality, classification, convolutional neural network. 1. Introduction The challenges of natural language processing are becoming increasingly important due to the ever-increasing amount of information on the Internet and the need to navigate this information. Tasks that are widely used in natural language processing include text classification, creating chatbots or generating answers to user questions, machine translation from one language to another, language recognition, spelling, identifying parts of speech in a sentence and their annotation, rewriting text information for creating web content. The labeled data set which contains text documents and their labels is used to train the classifier. Text classification is widely used in tonal analysis (Imdb, Yelp review classification), stock market analysis, for automated e-mail responses. Methods based on in-depth learning of neural networks have become current practice along with classical algorithms for text mining. The following neural network architectures are used to solve text classification problems: recurrent neural network, hierarchical attention network and convolutional neural network [1]. Document classification is a process of assigning documents to a certain category depending on their content. Text classification is necessary to solve the following tasks: personification in advertising; sites division by thematic catalogs; fighting against deceptive or misleading advertising correspondence (spam); text tone recognition, i.e. determining the color of emotions in the text. COLINS-2022: 6th International Conference on Computational Linguistics and Intelligent Systems, May 12–13, 2022, Gliwice, Poland EMAIL: rpele@ukr.net (R. Peleshchak); vasyl17.lytvyn@gmail.com (V. Lytvyn); peleshchakivan@gmail.com (I. Peleshchak); Khudyy@ukr.net (A. Khudyy); zozylka3@gmail.com (Z. Rybchak); solomiyanytrebych@gmail.com (S. Mushasta) ORCID: 0000-0002-0536-3252 (R. Peleshchak); 0000-0002-9676-0180 (V. Lytvyn); 0000-0002-7481-8628 (I. Peleshchak); 0000-0003- 2029-7270 (A. Khudyy); 0000-0002-5986-4618 (Z. Rybchak); 0000-0003-4932-4113 (S. Mushasta) ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) The aim of the work is to develop a new model of the architecture of a convolutional neural network with parallel and sequential connections between layers to classify the tonality of texts with increased efficiency. 2. Literature review Most modern machine learning algorithms focus on describing the features of objects, so all documents are converted into a real feature space. There is an idea that words are responsible for the belonging of a document to a certain class, and the texts of one class will have many similar words. The best-known ways to transform text into feature space are based on statistical information about words. Each object (text) is converted into a vector, the length of which is equal to the number of words in all sample texts. There are three main strategies for detecting features in the text tonality analysis: the Bag of Words model [1], the word embedding model [2, 3] and the graph network model [4]. Fig. 1 presents the structure of the study of the text tonality analysis in accordance with the three main methods used to identify features. Figure 1: Text tonality analysis in accordance with the three main methods used to highlight the features The main assumption of the Bag-of-words model is that the order of the words in the document is not important, and the collection of documents can be considered as a simple selection of pairs "document - word" (d, w), where d  D , where D  d1 ,..., dn  – set of text documents; w  Wd , where  Wd  w1 ,..., wnd  – sequence of words, n – length of the document d. All documents are presented in d the form of a matrix T   t d , w , each line of which corresponds to a certain document or text, and each column corresponds to a certain word. Element td , w corresponds to the number of occurrences of the word w in document d. Let's assume that a sentence is a syntactically ordered set of words. It consists of m words. Each word is encoded with 1 in the sentence – m coding, i.e. each word w will correspond to a vector of length m. The component of this vector, which corresponds to the encoded word, is equal to 1, in the position that is equal to the ordinal number of the word placement in the sentence, and 0 – in all other positions. Therefore, the bag-of-words model was developed in the methods of detecting features, such as part-of-speech tagging, POS tagging, POST and n-grams tagging phrases. Tagging a part of speech is also known as grammatical tagging or identification of parts of speech. It is the process of detecting a word in a text by belonging to a certain part of speech, based on its definition and context – on its connection with related words in a phrase, sentence, or paragraph. The most popular way to convert text into a vector is the Bag-of-words & TF-IDF model [5]. As in the Bag-of-words model, all documents are presented in the form of a matrix T   t d , w . But the element td , w corresponds to the function TF IDF (w, d, D) of the word w  Wd in the document d D Definitions 1. TF-IDF – it is a statistical measure used to avaluate how important a word is in the context of a document. It is calculated by the formula: TF  IDF  w, d , D   TF  w, d   IDF  w, D  . (1) where TF – word frequency, which evaluates the importance of the word wі within a certain document n TF  w, d   m i . (2)  nk k 1 ni – the occurrences of the word i in the document. m  n – the total number of words in the document. k 1 k IDF – the inverse document frequency. IDF consideration reduces the importance of commonly used words. D IDF  w, D   log . (3) di  wi D – number of documents in the corpus. di  wi – the number of documents in which the word wi occurs. Often the information in the text is marked not only by individual words, but also by a sequence of words, i.e. phrases and phraseological units. In this case, the model of Bag of Ngrams & TF-IDF is used to get the language features when converting text into vector. N-grams [6] are sequences of N words in which a single word depends on several others. N-gram is an indicator that the data of N words related in the text classification tasks. The Bag of N-grams & TF-IDF model is similar to the Bag-of-words & TF-IDF model, only the feature vector for each document except TF-IDF words contains TF-IDF of all N-word sequences. TF  IDF  w, d , D, N   TF  w, d   IDF  w, D   TF  N , d   IDF  N , D  . (4) Ng D TF  N , d   IDF  N , D   M  log . (5) d g  Pg  g g 1 N N g – occurrences of g, N-gram in document. M  N – the total number of N-gram in document; in case M  m . g 1 g d g  Pg – the number of documents in which N-gram Pg occurs. POS tagging and N-gram were combined in [7, 8]. This model can improve the accuracy of classification according to experimental results. However, while analyzing the tonality of short texts [9, 10] it was found that this model cannot achieve satisfactory accuracy because in short texts the composition of sentences is quite arbitrary. Therefore, it is impractical to apply tagging of parts of speech to their analysis [9]. There are three main ways of analyzing the tone of short texts. They are based on vocabulary, traditional machine learning and deep learning. The model of text tonality analysis based on vocabulary allows to obtain the sentimental tendency of texts by calculating and evaluations of texts by words with sentimental information. The model is based on traditional machine learning, is independent of vocabulary and has the ability to independently study the sentimental features of texts. The model of text sentiments analysis is based on deep learning, allows more advanced and unspeakable sentimental features of texts [11]. Therefore, the features obtained by the model of text tonality analysis based on deep learning, are abstract and difficult to be expreseds clearly. In terms of differentiation in a sentence, the authors of [12] have developed a classification system that can identify words that convey sentimental polarity. The authors [13] used a classification algorithm developed using SentiWordNet emotion assessment and achieved significant performance improvements on six assessment datasets. SentiWordNet is an opinion lexicon that assigns three mood assessments to each WordNet syntax: positivity, negativity, and objectivity. Some studies have shown that using SentiWordNet for mood assessment of a word and adding it as a function can improve the accuracy of key analysis [7, 14]. The above given literature review shows that the text classification based on the bag-of-words model can achieve better results if the characteristics of the word are properly obtained. However, the bag-of-words model has also some drawbacks, as it does not take into account the order of words in a sentence, i.e. syntax, and cannot convey deep-root semantic features and semantic combinations. Text tonality analysis based on the word embedding model solves the problem of the bag-of-words model, if we apply the vector of words in the multidimensional feature space [15]. The methods of presenting words using a fixed length vector (when its length is equal to the number of words used in the sample) are the most well-known [16]. Each vector consists of zeros 0 and ones 1. Word2Vec is a technology [17] focused on statistical processing of large arrays of textual information. Word2Vec collects statistics on the words occurrence in the data, removes the least and / or most common words, then solve the problem of dimensionality reduction the by using neural network methods to and produces compact vector representations of words of predetermined length. In this case, Word2Vec maximizes the cosine similarity between word vectors that occur in similar contexts and minimize the cosine simularity between words that do not occur together. A cosine similarity measures the similarity between two vectors. The cosine similarity between vectors A and B is calculated by the formula: n AB A B i i similarity  cos      i 1 . (6) A  B n n  A    B  2 2 i i i 1 i 1 It should be noted that two different neural network architectures can be used for implementing Word2Vec technology to convert the word into a vector: Continuous Bag of Words and Skipgram. The word embedding model is based on the principle of "distance similarity" and has a smoothing function. Another advantage of the word embedding model is that it is a method of learning without a teacher. It is proved that the word embedding model can get more semantic and grammatical features than the bag-of-words model. This advantage allows word embedding model to achieve very good results in a variety of natural language processing tasks; [2, 15] developed the QVEC method to measure the evaluation of the effectiveness of the representation of the characteristics of different models of text analysis. The findings show that for 300D QVEC word vector assessment of the text tonality based on the word embedding model is higher than in other models. In recent years, a combination of word embedding model and in-depth learning model has been used for text analysis with better performance. The authors [18] developed a word embedding learning algorithm that combines word vectors with RNN and can be well applied to speech recognition. The authors [19, 20] combine word vectors with long-term short-term memory (LSTM) to achieve better efficiency. Although CNN, developed by the authors [1], has only one convolutional layer, its classification efficiency is much better than in the conventional machine learning classification algorithm. However, this method cannot highlight convolutions in large texts. In 2017, the authors [21] identified dependencies in large texts by deepening the network. The authors [22] presented a structure similar to DenseNet, using abbreviations between the upper and lower convolution blocks, so that larger objects can be obtained from smaller feature combinations. However, the model used a convoluted core of a certain size, which slid from the beginning of the text to the end, creating a map of features. The authors [23] introduced a method of learning from a small sample to classify the text. The authors [24, 25] developed a model of textual steganography by combining text with concealment of information and achieved favorable results. The authors [26] used the temporal functions of several objects based on LSTM to detect spam. The results obtained in [26] showed efficiency in the tonality analysis in long texts. Parallel DenseNet is proposed in [27] based on traditional tightly coupled convolutional networks to implement short text tone analysis. In particular, this paper proposes two new feature extraction units based on DenseNet and a multiscale convolutional neural network. This model is able to extract both local and global short text features by combining output and features extracted using the parallel feature extraction block and then sending the combined features to the final classifier. The principle of text tonality analysis based on the bag-of-words model is to put all the words in one package, the so-called word bag. When a word appears in a sentence, the position of that word in the vector is 1 and the position of the other words is 0. In this case, the words in the sentence are out of order. Therefore, the bag-of-words model was developed in methods of distinguishing features, such as tagging part of speech (POS) and tagging phrases N-grams. Tagging a part of speech, also known as grammatical tagging, is the process of marking words in a text (corpus) as corresponding to certain parts. N-gram tagging of phrases is based on the fact that one word depends on several other words. When denoting a word, this word is usually combined with the previous word. GloVe technology [28] allows to obtain the corresponding vector of fixed length for each word in the text data using statistical information about the word in the data. Let the size of the dictionary be equal to V. All words found in the data are numbered 1,V . A word-word co-occurrence matrix is formed X  RV V , where xij – indicates how many times a word і is used in the context of the word j. Word а occurs in the context of word b, if there is a part of the text with no more than nine words between them. Let's mark X i   xik (sum of the row i). Then the probability that word j occurs in the context of k 1 xij word і is Pij  P  j | i  . Xi It should be noted that if the word occurs in the context of the word k more often than a word j P Pjk occurs in the context of the word k, so ik  1 , 1. Pjk Pik Let's build a function F  wi , w j , wˆ k  that shows which of the words і or j more likely to occur in the context of the word k. wi , w j , wˆ k – vector representation of words і, j and k. F  wi , w j , wˆ k   Pik . (7) Pjk F  wiT wˆ k  The authors of the GloVe model suggested using  T  F  w wˆ  , F  wi  w j  wˆ k  T j k F  wiT wˆ k   Pik  xik . Xi Then you can choose F  x   exp  x  as a function F   and choose the vector wi that wiT wˆ k  log  Pik   log  xik   log  X i  . Now, given that log  X i  is fixed, we rewrite the problem as follows wiT wˆ k  bi  bˆk  log  xik  , b  bˆ  log  X  . i k i As a result, the authors use the loss function J and adjust the model using an algorithm AdaGrad [4]. Function f  x  must meet the following requirements: f  0   0 ; f  x  – does not decrease; f  x  – relatively small for large values x.  x   , x  xmax The authors used the following: f  x    xmax  .   1, x  xmax 3 The parameters were chosen empirically:   , xmax  100 . 4 3. Problem statement The problem of classification of textual information is formulated as follows: let there be a finite number of classes in a category C  c1 , c2 ,..., cm  and a finite set of documents D  d1 , d2 ,..., dm  and unknown target function f which determines the correspondence for each pair (document, class) f : D  C  0,1 . The task is to find the function f 0 which is as close as possible to the objective function f , that is, a minimum rate is provided min f  f 0 in Euclidean space. Function f 0 is called a classifier. Texts tonality is understood as the emotional vocabulary and emotional assessment given by the author in relation to the object. The analysis of the text tonality is of great practical importance: quality avaluation of goods and services based on users’ feedback of Internet resources; prevention of extremism and terrorism; analysis of stock markets and forecasting the volatility (variability) of financial assets. The main task of text tonality analysis is to identify ideas in the text and determine their properties. Opinions are of two types: opinion comparison and direct opinion. Direct opinion contains the author's statement about the object. The formal opinion definition is described as a tuple of 4 elements K  o  p  , e  f  , t , h , where o  p  – orientation or polarity assessment of tonality; e  f  – entity- or feature-object of tonality or its properties f ; t  time – end of the oppinion; h  holder – subject of tonality (author). Text tonality is assessed as neutral, negative or positive. In general statistics, volatility is an indicator that characterizes the fluctuations of time series or trends in market prices and incomes over time; composing texts with predetermined emotional characteristics. There are different types of text classification: subjective; objective; multiscale, i.e. classification according to a multilevel scale and classification according to a binary scale. This article uses the Keras machine learning framework and the Python programming language to solve this problem. 4. An architecture model of a branched convolutional neural network with parallel and serial connections The architecture of a new hybrid convolutional neural network (Fig. 2) consists of a convolutional neural network block with parallel and serial connections between layers and a maximum pulling layer which is obtained from the matrix of the original text x0   x1 , x2 ,..., xm md , length m; where xi  Rd , i  1,2,..., m . You can use different cores to convolve a sentence and to obtain different features in the proposed convolutional neural network. The mined features are combined with the matrix of maximum pulling obtained from x0 with the help of a convolutional neural network block. These features are classified using the MLP classifier. It should be noted that the proposed new hybrid convolutional neural network differs from the concise neural network DenseNet [27, 29]. In particular, this model has less convergence time and does not require multiple learning iterations (less computational resource) due to the lack of a dense block DenseNet, which is used to classify the texts tonality [30, 31]. Figure 2: The structure of a hybrid convolutional neural network with parallel and serial connections between layers The text is entered at the input of the branched convolutional neural network (Fig. 2) x0   x1 , x2 ,..., xm md , xi  Rd , i  1,2,..., m with length m. This neural network (Fig. 2) consists of two parts: a block of convolutional neural network with parallel and serial connections between layers and a maximum pulling layer. The block of the convolutional neural network consists of layers with different window sizes, which are connected in parallel along the columns and sequentially along the rows of the structure. The input of each convolutional layer in the column consists of the sum of the outputs of all previous layers. A parallel text matrix x0 is applied at the entrance of convoluted layers with dimensions 5  d ,4  d ,3  d ,2  d . To classify the text tonality we will use the average general features, which can be obtained by combining the two characteristics (features obtained from the convolutional network block ̂1 and from the maximum pulling layer ̂ 2 ) due to the global average. Each convolution subnet is used to mine features using different word combinations, depending on the size of the kernels. In particular, for the kernel 5  d a combination of 5 words is used to mine the features. Similarly, the input text matrix is presented based on a subnet with kernels 2  d ,3  d ,4  d . Input text matrix x0 is introduced into convoluted layers of size 5  d ,4  d ,3  d ,2  d to mine features. yˆ15  f 5d  x0  yˆ14  f 4d  x0  . (8) yˆ13  f 3d  x0  yˆ12  f 2d  x0  yˆ15 , yˆ14 , yˆ13 , yˆ12 – the matrix of features after the first convolutional transformation layer with the sizes of kernels 5  d ,4  d ,3  d ,2  d . After that, the initial input text matrix is combined with the feature matrices after the convolution transformation, and we obtain new input text matrices xˆ15 , xˆ14 , xˆ13 , xˆ12 xˆ15  Cat  x0 , yˆ15  xˆ14  Cat  x0 , yˆ14  . (9) xˆ13  Cat  x0 , yˆ13  xˆ12  Cat  x0 , yˆ12  We introduce new input text matrices xˆ15 , xˆ14 , xˆ13 , xˆ12 in the second convolutional layers with kernels 5  d ,4  d ,3  d ,2  d . yˆ 25  f 5d  xˆ15  yˆ 24  f 4d  xˆ14  . (10) yˆ 23  f 3d  xˆ13  yˆ 22  f 2d  xˆ12  To obtain new feature matrices, we perform the following operations to combine matrices: xˆ25  Cat  x0 , yˆ15 , yˆ 25  xˆ24  Cat  x0 , yˆ14 , yˆ 24  . (11) xˆ23  Cat  x0 , yˆ13 , yˆ 23  xˆ22  Cat  x0 , yˆ12 , yˆ 22  After convolution transformations, a pulling operation is performed to obtain new feature matrices. xˆ 1  h46  xˆ25  xˆ  2   h47  xˆ24  . (12) xˆ  3  h48  xˆ23  xˆ  4   h49  xˆ22  xˆ1 , xˆ 2 , xˆ3 , xˆ  4 – new matrices of features obtained after the pulling operation h46 , h47 , h48 , h49 . After that, the new feature matrices are combined to obtain a matrix of a multiscale block for mining convolutional features.   ˆ 1  Cat  xˆ 1 , xˆ  2 , xˆ 3 , xˆ  4  . (13) The Cat function describes the merging of matrices xˆ1 , xˆ 2 , xˆ3 , xˆ  4 . The feature matrix from the maximum pool layer of dimension 50 is described by the formula: ˆ 2  h50  x0  . (14) We combine matrix features ̂1 and ̂ 2 using the function Cat to obtain a general feature matrix ̂ ˆ  Cat  ˆ 1 , ˆ 2  . (15) And perform a one-dimensional global averaging operation ̂ to obtain a final feature matrix  % % g  ˆ  .  (16) Function g is a one-dimensional avarage merger. After that, we present the final feature matrix to the neural network classifier (MLP) to classify the text tonality. 5. Computer experiment Six different data sets were selected for the computer experiment, which are divided into different categories of text tonality. They include:  GameMultiTweet dataset, this set consists of 12780 parts, which are divided into three categories consisting of 3952 - 915 - 7913 parts.  SemEval dataset, this set consists of 7967 parts, which are divided into three categories, consisting of 2964 - 1151 - 3852 parts.  SS-Tweet dataset, this set consists of 4242 parts, which are divided into three categories, consisting of 1953 - 1336 - 953 parts.  AG News dataset, this set consists of 127,600 parts, which are divided into four categories consisting of 31,900 - 31,900 - 31,900 - 31,900 parts.  R8 dataset, this set consists of 4203 parts, which are divided into eight categories consisting of 1392 - 241 - 2166 - 20 - 162 - 0 - 72 - 150 parts.  Yahoo! Answers dataset, this set consists of 350,000 parts, which are divided into ten categories consisting of 23726 - 35447 - 31492 - 35252 - 35546 - 25787 - 25787: 81571 - 23961 - 28706 - 28482 parts. All of these data sets were randomly divided into three parts: 70% training set, 15% validation set, and 15% testing set. The statistics for each set are given in Table 1. Table 1 Dataset information Dataset Train Validation Test Categories Avg. length GameMultiTweet 8964 1917 1917 3 26 SemEval 5577 1195 1195 3 31 SS-Tweet 2870 636 636 3 29 AG News 89320 19140 19140 4 45 R8 2943 630 630 8 66 Yahoo! Answers 245000 52500 52500 10 112 Our model is compared with other models:  CNN model, consisting of three convolutional layers in which convolutional kernels have the same size.  TextCNN model proposed in the paper [1].  FastText model proposed in the paper [17].  DPCNN model proposed in the paper [21]. In our research, a sentence was converted to a 150x300 matrix using word2vector. Some parameters were set, such as using the adam optimizer, and setting the learning rate to 0.001, dropout rate to 0.2, and L2 loss weight to 10−8. The model batch size was 50 and the number of epochs was 5. If the loss was not reduced in 10 consecutive periods, the training was stopped. In the pre-training word embedding model, 300D word2vector word embedding was used. Table 2 Comparison of our model with others Model GameMultiTweet SemEval SS-Tweet AG News R8 Yahoo! Answers CNN 73,5 60,5 50,2 85,6 92,3 47,3 TextCNN 77,5 62,7 51,1 88,9 94,4 49,5 FastText 78,3 63,8 51,4 88,5 96,1 39,8 DPCNN 75,6 47,5 43,2 87,1 88,5 47,5 Our Model 78,5 66,0 52,4 89,7 98,1 51,6 Findings presented in Table 2 show that our model has achieved higher accuracy than its counterparts. 6. Conclusions A hybrid convolutional neural network for text tonality analysis has been developed. It consists of a convolutional block of parallel and serial connections between layers and a layer of maximum pulling obtained from the matrix of the original text. It is shown that such a hybrid convolutional neural network mines text features using a convolutional block. Then it mines and classifies the features by combining these features with the original textual features. It was found that the model of hybrid convolutional neural network has less convergence time and computational resource compared to the parallel DenseNet. It was proved that a hybrid convolutional neural network with parallel and serial connections between layers provides higher efficiency of text tonality classification in 6 different databases GameMultiTweet, SemEval, SS-Tweet, AG News, R8, Yahoo! Answers compared to other base models. 7. References [1] Kim Y. Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014, pp. 1746–1751. URL: https://arxiv.org/abs/1408.5882 [2] Daniel Jurafsky, James H. Martin. Speech and Language Processing, 2021. URL: https://web.stanford.edu/~jurafsky/slp3 [3] P. Liu, X. Qiu, X. Huang. Recurrent neural network for text classification with multi-task learning. In Proc. IJCAI, New York, USA, 2016, pp. 2873–2879. URL: https://arxiv.org/abs/1605.05101 [4] L. Yao, C. Mao, Y. Luo. Graph convolutional networks for text classification. In Proc. AAAI, Hawaii, USA, 2019, pp. 7370–7377. URL: https://arxiv.org/abs/1809.05679 [5] Zulkarnain, Tsarina Dwi Putri. Intelligent transportation systems (ITS): A systematic review using a Natural Language Processing (NLP) approach. Heliyon, 2021, Vol. 7, e08615. doi: 10.1016/j.heliyon.2021.e08615 [6] J. M. Chenlo, D. E. Losada. An empirical study of sentence features for subjectivity and polarity classification. Information Sciences, 2014, Vol. 280, pp. 275–288. DOI:10.1016/j.ins.2014.05.009 [7] C. Priyanka, D. Gupta. Identifying the best feature combination for sentiment analysis of customer reviews. Іn Proc. ICACCI, Mysore, India, 2013, pp. 102–108. DOI:10.1109/ICACCI.2013.6637154 [8] E. Kouloumpis, T. Wilson, J. Moore. Twitter sentiment analysis: The good the bad and the omg! Іn Proc. ICWSM, Barcelona, Spain, 2011, pp. 538–541. URL: https://ojs.aaai.org/index.php/ICWSM/article/view/14185 [9] S. Sun, H. Liu, A. Abraham. Twitter part-of-speech tagging using preclassification Hidden Markov model. Іn Proc. IEEE SMC, Seoul, South Korea, 2021, pp. 1118–1123. DOI:10.1109/ICSMC.2012.6377881 [10] C. dos Santos, M. Gatti. Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of COLING. Dublin, Ireland, 2014. pp. 69–78. URL: https://aclanthology.org/C14-1008 [11] Yanyan W., Qun C., Jiquan S., Boyi H., Murtadha A., Zhanhuai Li G. Machine Learning for Aspect-level Sentiment Analysis, 2019. URL: https://arxiv.org/abs/1906.02502 [12] D. Tang, F. Wei, B. Qin, L. Dong, T. Liu et al. A joint segmentation and classification framework for sentiment analysis. In Proc. EMNLP, Doha, Qatar, 2014, pp. 477–487. DOI:10.3115/v1/D14-1054 [13] F. H. Khan, S. Bashir and U. Qamar. TOM: Twitter opinion mining framework using hybrid classification scheme. Decision Support Systems, 2014, Vol. 57, pp. 245–257. DOI:10.1016/j.dss.2013.09.004 [14] W. Chamlertwat, P. Bhattarakosol, T. Rungkasiri, C. Haruechaiyasak. Discovering consumer insight from twitter via sentiment analysis. Journal of Universal Computer Science, 2012, Vol. 18, pp. 973–992. URL: https://www.semanticscholar.org/paper/Discovering-Consumer-Insight- from-Twitter-via-Chamlertwat-Bhattarakosol/b32c462e6a5821c62c852bb42a8730eff880f8cd [15] Yulia Tsvetkova, Manaal Faruqui, Wang Ling, Guillaume Lample, Chris Dyer. Evaluation of Word Vector Representations by Subspace Alignment. Language Technologies Institute Carnegie Mellon University. Pittsburgh, PA, USA, 2021. URL: https://aclanthology.org/D15- 1243.pdf [16] Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space, 2014. URL: https://arxiv.org/abs/1301.3781 [17] A. Joulin, É. Grave, P. Bojanowski and T. Mikolov. Bag of tricks for efficient text classification. in Proc. EACL, Valencia, Spain, 2017, pp. 427–431. URL: https://arxiv.org/abs/1607.01759 [18] S. Kombrink, T. Mikolov, M. Karafit and L. Burget. Recurrent neural network based language modeling in meeting recognition. In Proc. INTERSPEECH, Florence, Italy, 2011, pp. 2877– 2880. URL: https://www.semanticscholar.org/paper/Recurrent-Neural-Network-Based- Language-Modeling-in-Kombrink-Mikolov/b4fc91e543ec868658cde6170f1e59c33292e595 [19] J. Cheng, X. Zhang, P. Li, S. Zhang, Z. Ding et al. Exploring sentiment parsing of microblogging texts for opinion polling on chinese public figures. Applied Intelligence, 2016, Vol. 45, pp. 429– 442. DOI:10.1007/s10489-016-0768-0 [20] M. Sundermeyer, R. Schlter, H. Ney. LSTM neural networks for language modeling. In Proc. INTERSPEECH, Portland, USA, 2012, pp. 194–197. DOI:10.21437/Interspeech.2012-65 [21] R. Johnson, T. Zhang. Deep pyramid convolutional neural networks for text categorization. In Proc. ACL, Vancouver, Canada, 2017, pp. 562–570. DOI:10.18653/v1/P17-1052 [22] S. Wang, M. Huang, Z. Deng. Densely connected CNN with multi-scale feature attention for text classification. In Proc. IJCAI, Stockholm, Sweden, 2018, pp. 4468–4474. URL: https://www.semanticscholar.org/paper/Densely-Connected-CNN-with-Multi-scale-Feature-for- Wang-Huang/35f0b854901dc6c5a69b271637d302f7db49b79a [23] L. Yan, Y. H. Zheng, J. Cao. Few-shot learning for short text classification. Multimedia Tools and Applications, 2018, Vol. 77, pp. 29799–29810. DOI:10.1007/s11042-018-5772-4 [24] L. Xiang, S. Yang, Y. Liu, Q. Li, C. Zhu. Novel linguistic steganography based on character- level text generation. Mathematics, 2020, Vol. 8, pp. 1558. DOI:10.3390/math8091558 [25] Z. Yang, S. Zhang, Y. Hu, Z. Hu, Y. Huang. VAE-Stega: Linguistic steganography based on variational auto-encoder. IEEE Transactions on Information Forensics and Security, 2021, Vol. 16, pp. 880–895. DOI:10.1109/TIFS.2020.3023279 [26] L. Xiang, G. Guo, Q. Li, C. Zhu, J. Chen et al. Spam detection in reviews using lstm-based multientity temporal features. Intelligent Automation & Soft Computing, 2020, Vol. 26, pp. 1375–1390. DOI:10.32604/iasc.2020.013382 [27] Luqi Yan, JinHan, Yishi Yue, Liu Zhang, Yannan Qian. Sentiment Analysis of Short Texts Based on Parallel DenseNet. Computers, Materials & Continua, 2021, Vol. 69, pp. 51–65. DOI:10.32604/cmc.2021.016920 [28] Pennington J. Glove. Global vectors for word representation. EMNLP, 2014, pp. 1532–1543. URL: https://nlp.stanford.edu/pubs/glove.pdf. [29] G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger. Densely connected convolutional networks. in Proc. CVPR, Hawaii, USA, 2017, pp. 4700–4708. URL: https://arxiv.org/abs/1608.06993 [30] Vasyliuk A., Basyuk T. Construction features of the industrial environment control system. Proceedings of the 5th International conference on computational linguistics and intelligent systems (COLINS 2021), Lviv, Ukraine, 2021, Vol. 2870, pp. 1011–1025. URL: http://ceur- ws.org/Vol-2870/paper76.pdf [31] Basyuk T., Vasyliuk A. Approach to a subject area ontology visualization system creating. Proceedings of the 5th International conference on computational linguistics and intelligent systems (COLINS 2021), Lviv, Ukraine, 2021, Vol. 2870, pp. 528–540. URL: http://ceur- ws.org/Vol-2870/paper39.pdf