-

Feature Enhanced Dual-GRU for Aspect-based Sentiment Analysis

Meng Zhao

zhao_meng@hrbeu.edu.cn 0

Jing Yang

yangjing@hrbeu.edu.cn 0

Shuo Wang

Jiaqi Liu

0 0 Harbin Engineering University , Harbin, Heilongjiang 150001 , China

92 101

Aspect-based sentiment analysis (ABSA) aims to predict the sentiment polarity with the different aspect terms or categories, which play an important role to guide the representation of context vector. Previous studies have used concatenation operation as a common means of information aggregation, which increase irrelevant noise and lose the dependence between the original features. In this paper, we propose a lightweight feature enhanced dual-GRU to selectively learn the feature relevance between aspect terms and context. The dual-GRU contains an extended aspect-related GRU and a position-related GRU to generate relevant information adaptively. Meanwhile, we construct a context-related GRU to enhance the dependency between aspect terms and context. Extensive experimental results demonstrate that the proposed model is reliable and effective in improving the performance of the two tasks of ABSA.

eol>Aspect-based sentiment analysis GRU Attention networks Position Information

1. Introduction 1 embeddings, which is simple and effective but prone to context-irrelated noise from position information.

To address this problem, inspired by Xing et al. [ 4 ], we propose a feature enhanced dual-GRU (FEGRU) approach relying on the internal gating mechanism to control the information of the hidden state learned from input features. It is inappropriate to incorporate the extra information into the hidden state of GRU[ 3 ] (or LSTM[ 2 ]). Consequently, we extended different GRU-based variants to guide the extraction of necessary information, and applied attention mechanisms to highlight the important sequence information on the dependencies of the sentences.

The main contributions of our paper are as follows: 1. We extend different GRU variants, Aspect-related GRU and Position-related GRU, to selectively embed the aspect and the position information. 2. We construct the Context-related GRU to enhance the dependency between aspect terms and context. 3. We present a lightweight feature enhanced dual-GRU (FE-GRU) to guide such related information to predict sentiment polarity.

2. Related Works

Aspect based sentiment analysis [ 5 ] has received widespread concern in recent years. Many approaches have emerged to deal with the classification of sentiments, among which deep-learning network is current the mainstream approach for sentiment analysis. There are too many neural network structures based on RNN, especially the LSTM and its variant GRU. Neural networks have the ability to learn by themselves through their own features and be designed different structures for various tasks.

Long Short-Term Memory is a modified version of recurrent neural networks to solve the vanishing gradient problem of RNN. At the same time, attention mechanism [ 25 ][ 26 ] can avoid the long-term dependence experienced by the LSTM model. They have been explored in various NLP tasks and have shown a good performance for NLP tasks. Wang et al. [ 7 ] respectively concatenated the aspect embeddings to the hidden state generated by LSTM and the input word embeddings, and utilized attention mechanism to capture the key part of sentence. Tang et al. [ 8 ] used two LSTMs with target as the end point to model the preceding and following contexts respectively and concatenated the last hidden vectors of the two LSTMs. The attention mechanism has the ability to distinguish the importance of each sequence of information and pays more attention to this particular sequence with the given aspect. Chen et al. [ 10 ] leveraged multiple-attention mechanism to capture sentiment features separated by a long distance through a Bi-LSTM. Huang et al. [ 11 ] introduced an attention-over-attention (AOA) [ 13 ] to generate mutual attentions not only from aspect-to-text but also text-to-aspect.

Convolutional neural network (CNN) is also a means of extracting sentence features. It is better at extracting local and position-invariant features than RNN. Xue et al. [ 9 ] proposed a gated convolutional network with aspect embedding (GCAE) based on a CNN and gating mechanisms. Li et al. [ 14 ] used bidirectional long short-term memory (Bi-LSTM) to produce the context information and designed a target-specific transformation and context-preserving mechanism to learn integrated word representation and target representation rather than directly concatenating them. Target-specific Transformation Networks (TNet) finally adopt position-aware convolutional layer instead of vanilla convolutional layer. Liu et al. [ 20 ] proposed a novel neural network framework, namely the Gated Alternate Neural Network (GANN), which was aimed to enhance the capability of model in capturing long-distance dependency and modeling sequence information. Compared with traditional RNN, LSTM and Memory Network [ 15 ] has the ability of long-term memory. For aspect-level sentiment analysis. many improved models [ 16 ][ 17 ][ 18 ][ 28 ] based on memory networks have emerged to fit the memory of the features themselves.

3. Our Framework

This section presents a novel feature enhanced dual-GRU model for ABSA. The architecture of the proposed model is illustrated in Fig. 1. 3.1.

Variants of GRU

Given a sentence the aspect (aspect terms or categories) contains m (m ＜ n) words = , , … , , the embedding vectors of the given aspect = , , … , , and the embedding vectors of a sentence = , , … , , the purpose of ABSA is to predict sentiment polarity = 0,1,2,3 or = 0,1,2 for sentence S, where 0, 1, 2 and 3 denote the “negative”, “neutral”, “positive” and “conflict” sentiment polarities, respectively.

3.1.1. Aspect-related GRU

To reduce the noise from aspect-irrelevant, we design the Aspect- related GRU (A-GRU), which is also a variant of the AA-LSTM [ 4 ]. Similarly, we add the aspect-reset gate and aspect-update gate to control how much aspect information flows into the hidden state. Fig. 2 (a) illustrates the architecture of the Aspect-related GRU. The core structure of A-GRU are the reset gate and the update gate of aspect information and input information. Through the core gate structure, the aspect and input information combine with the previous hidden state to output a new hidden state at each time step. Therefore, the hidden state can carry aspect-related information throughout the processing of the time sequence.

At time step t, the aspect-reset gate ra , reset gate rt and the candidate ha . are computed as follows: where hta−1 denotes the previous hidden state, a is the aspect embedding vectors, σ is the activation function. The candidate ha calculated by rt represents the new aspect-related input information at the current moment. The aspect-update gate za , update gate zt and the new hidden state hta−1 are computed as follows: (1) (2) (3) (4) ra =σ (Wra[a; hta−1] + bra） rt =σ (Wrt [xt ; hta−1] + ra  a + brt ) ha = tanh(Wha[rt * hta−1; xt ]) za =σ (Wza[a; hta−1] + bza ) where the aspect-update gate za combine input information with the aspect information to produce the aspect-related context information, which may contain context-irrelated noise. The aspect-update gate is used to control how much aspect-related information is brought into the update gate zt while the update gate zt decides how much aspect-related input information to add and how much aspectirrelated input information to throw away.

where p is the position embedding vectors, which is invariable through the sequence chain. At time step t, the position-reset gate rp is used to calculate how much sensitive position information flows into the hidden state. At the next step, the position information is directly controlled by the update gate rt (a): Aspect-related GRU (b): Position-related GRU

The extended aspect gate allows the GRU to forget certain irrelated parts of the aspect information and keep previous semantic information simultaneously. We regard certain irrelated parts of the aspect information and context information as noise, which is a factor influencing the result of predicting the given aspect’s sentiment polarity. Our extended A-GRU can solve this problem by aspect gates to learn to reduce the noise in the process of information transfer.

3.1.2. Position-related GRU

We construct Position-related GRU (P-GRU) to dynamically learn the position information of aspects in each sentence. Fig. 2 (b) illustrates the architecture of the Position-related GRU. We add a position gate to control the inflow of position information like aspect gates of A-GRU and leave out the position update gate. P-GRU can be formalized as follows:

rp = σ (Wrp[ p; htp−1] + brp ) rt = σ (Wrt [xt ; htp−1] + rp  p + bpt )

zt =σ (Wzt [xt ; hta−1] + bzt ) hp = tanh(W a [rt * htp−1; xt ])

h htp = (1 − zt ) * htp−1 + zt * h p (5) (6) (7) (8) (9) (10) (11) of input information rather than the position update gate. rt is to ensure that the position information is reserved for the aspect words during the learning process of the input information.

3.1.3. Context-related GRU

Compared to LSTM, GRU has fewer parameters and is easier to calculate when adding aspect and position gates. We design Context-related GRU to take into account the context-related information of the sentence, while also associating the aspect with the information of the sentence. Context-related GRU (C-GRU), not alike A-GRU, add the aspect information directly to hidden state through gate mechanism.

Comparatively, C-GRU is a reverse A-GRU, which swaps the contextual sequence input and aspect input, and adds control gates to ensure the original features of the aspect terms. Max pool layer selects the maximum value in local features, by this means we can extract the important aspect-related information. Thus, we introduce Maxout Layer to compress the aspect-aware information and add the pristine aspect features, which can be obtained as the following: c = Maxout(htc )

c′ = β * a + c where htc is the output of C-GRU and β is a trade-off parameter. 3.2.

Information fusion

The final loss function consists of two parts: the Euclidean Loss and the Cross-entropy Loss. For the C-GRU, we iteratively minimize the squared Euclidean between the original aspect terms and the c′ , which is defined as: n d (c′ , a) =  (c′ − a)2

i=1 For the ABSA task, we take the Cross-entropy Loss as the final loss function:

For ABSA tasks, we explicitly set position index of each aspect words in the sentence to zero, and define the relative distance to indicate the importance of each word relative to the aspect terms[ 12 ][ 21 ]. The final context-rich information is expressed as follows:

H = GRU (X ) (14) H a = A-GRU (H1, Aspect) H p = P-GRU (H1, Position)

H final = [H p ; H a ]

We introduce the attention mechanisms to highlight the dependency between the rich contextual features and context-related aspect information. The extent of concern is expressed by the weight of words, which can be expressed as follows:

The final layer we use a softmax layer to output the same number of nodes as the number of sentiment class:

ci = attention(h pa , c′ ) yi = softmax(W * ci + b) where the softmax operator is used to obtain the probabilities yi of each class label, W and b are the model parameters. 3.3.

Objective function (12) (13) (15) (16) (17) (18) (19) (20)

N  = − ( yilog( yi )) + d i=0 (21) where yi is the predicted sentiment distribution of each class. yi is the true sentiment polarity, N is the number of all training sentences.

4. Experiments

In this section, we will introduce four datasets to verify the effectiveness of our model and provide details of parameter settings. In addition, we extend the ATAE [ 7 ] and AOA [ 11 ] model and design different variants of FE-GRU to evaluate an ablation study. 4.1.

Datasets

We experiment on widely used datasets of SemEval: Restaurant 2014–2016 and Laptop 2014[ 22 ] for ABSA task. Datasets of SemEval consist of laptop and restaurant reviews. We retain the reviews with sentiment polarity of “conflict” and divide the reviews into four sentiment polarities: positive, neutral, negative, and conflict because it is unreasonable to set conflict reviews as positive or negative sentiment polarities or even remove the conflict reviews[ 23 ]. The details of the datasets for ABSA task are shown in Table 1.

In our experiments, the pre-trained Glove [ 1 ] embedding is adopted as word embeddings in the datasets and the dimensions of word embeddings, aspect embeddings and position embeddings are set to 300, the maximum sequence length of a sentence is set to 83. For the out of vocabulary words and weight matrices, we initialize using a uniform distribution U(-0.25,0.25). Note that we remove the PGRU to assess the influence of position information in the ACSA task, because the aspect category does not appear in the sentence. The batch size is set to {16,32} and the learning rate is set to {0.003,0.007}. 4.2.

Variants of FE-GRU

We utilize some baseline approaches to evaluate the effectiveness of the proposed FE-GRU model, and compare different variants of GRU to verify the effectiveness of the aspect gates and the position gates.

•

A-GRU is a particularly simple model to control the flow of aspect information via the aspect gates.

PA-GRU introduces the position gates and combine the hidden states of P-GRU and A-GRU to output the final sentiment class.

ATAE_A replaces the hidden state of the original LSMT with the state of aspect information based on ATAE-LSTM.

ATAE_PA adds the position gates to control the extent of position information flowing into the hidden state based on ATAE_A.

AOA_A uses the A-GRU instead of the original LSTM and outputs the hidden state with aspectrelated information.

AOA_PA concatenates the hidden state of A-GRU through the position-related information generated by P-GRU. w/o_GRU removes the A-GRU and P-GRU cells and use the word embeddings directly as input to the model. w/o_P-GRU removes the A-GRU of our FE-GRU. w/o_A-GRU removes the P-GRU of our FE-GRU.

w/o_C-GRU removes the C-GRU of our FE-GRU. 4.3.

Results and analyses

The results of experiment comparing the baseline model are shown in Table 2 and Table 3. It's obviously that the proposed FE-GRU has a better result than the baseline models, because our model has the ability to selectively learn the dependency information of the aspect terms. ATAE_LSTM and GCAE in the baseline models have a good classification accuracy partly because the aspect information is directly used to concatenate the context information. It's a very common choice for various existing models but lacks the ability to distinguish the irrelevant information. AA_LSTM is designed to influent the information flow instead of integrating the aspect information into the hidden state vectors. ATAE_AA_LSTM and IAN_AA_LSTM based on AA_LSTM effectively retain the aspect-related information compared to the original model. However, AA_LSTM introduces three gates in the LSTM increasing the training parameters and making the training process more difficult.

It is proved that the optimized gates can control the extent of information flowing into the hidden state. It is also not appropriate to combine position information directly into context in the existing models. In addition, the designed C-GRU associates aspect information into contexts as the clue to guide aspect features, which aims to enhance contextual dependencies.

F1 45.42

We chose the simple A-GRU and PA-GRU models to compare to prove the significant contribution of position information in sentiment classification, the experimental results are shown in Table 4. The results of the comparative experiments show that P-GRU is sensitive to the specific position of aspect terms in the sentence, which is learned and determined by P-GRU.

In addition, the experimental effect of the ATAE_LSTM model with position information is much better than the original model, which concatenates the aspect embedding and the hidden outputs of LSTM to generate attention weights. We change the hidden outputs of LSTM with the aspect-related context information and concatenate the position-related context information to calculate the weight of each sequence. The only difference between ATAE_A and ATAE_PA is that the latter has the ability to learn position information, and the experiment of two groups of ATAE demonstrate the reliability of learning position information through gate mechanisms. We use the hidden state of PA-GRU to replace the original LSTM output, which implies that the position-related information and the aspect-related information of the AOA model can improve the effect of sentiment classification.

5. Conclusions

In this paper, we have proposed a feature enhanced dual-GRU (FE-GRU) for ABSA task. The purpose of the proposed FE-GRU is to improve the validity and correlation of the context and the aspect information as much as possible. We designed different feature embedding strategies in the lightweight GRU to associate and enhance the specific information dependencies, which is a novel approach to aggregating information instead of the single concatenation operation. We conducted massive experiments with extensive models on the ACSA and ATSA tasks and achieved significant improvements on most datasets. The experimental results have proved the rationality and effectiveness of the proposed approach.

6. Acknowledgements

This paper is supported by the National Natural Science Foundation of China under Grant nos.61672179, 61370083.

7. References

[1]

Pennington ,

Socher , and

Manning , “Glove: Global vectors for word representation ,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2014 .

[2]

Hochreiter and

Schmidhuber , “Long short-term memory,” Neural Comput ., vol. 9 , no. 8 , pp. 1735 - 1780 , 1997 .

[3]

Chung ,

Gulcehre ,

Cho , and

Bengio , “ Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv [cs .NE], 2014 .

[4]

Xing et al., “ Earlier attention? Aspect-aware LSTM for aspect-based sentiment analysis , ” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence , 2019 .

[5]

Keith Norambuena ,

E. F.

Lettura , and

C. M.

Villegas , “ Sentiment analysis and opinion mining applied to scientific paper reviews,” Intell. Data Anal. , vol. 23 , no. 1 , pp. 191 - 214 , 2019 .

[6]

Yang ,

Jiang ,

Shen ,

Wu ,

Zhao , and

Zhou , “ Hierarchical human-like strategy for aspect-level sentiment classification with sentiment linguistic knowledge and reinforcement learning,” Neural Netw ., vol. 117 , pp. 240 - 248 , 2019 .

[7]

Wang ,

Huang ,

Zhu , and L . Zhao , “ Attention-based LSTM for Aspect-level Sentiment Classification,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , 2016 .

[8]

Tang ,

Qin ,

Feng , and T. Liu, “ Effective LSTMs for Target-Dependent Sentiment Classification,” arXiv [cs .CL], 2015 .

[9]

Xue and

Li , “ Aspect based sentiment analysis with gated convolutional networks,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics , 2018 .

[10]

Chen ,

Sun ,

Bing , and

Yang , “ Recurrent attention network on memory for aspect sentiment analysis , ” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , 2017 .

[11]

Huang ,

Ou , and K. M. Carley , “ Aspect level sentiment classification with attention-overattention neural networks ,” in Social, Cultural, and Behavioral Modeling, Cham: Springer International Publishing, 2018 , pp. 197 - 206 .

[12]

Liu and

Shen , “ Aspect-based sentiment analysis with gated alternate neural network , ” Knowl. Based Syst. , vol. 188 , no. 105010 , p. 105010 , 2020 .

[13]

Cui ,

Chen ,

Wei ,

Wang , T. Liu, and G. Hu, “ Attention-over-attention neural networks for reading comprehension ,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics , 2017 .

[14]

Li ,

Bing ,

Lam , and

Shi , “ Transformation networks for target-oriented sentiment classification,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1 : Long

Papers)

, 2018 .

[15]

Tang ,

Qin , and T. Liu, “ Aspect level sentiment classification with deep memory network,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , 2016 .

[16]

Zhang ,

Wang ,

Zou , and

Gan , “ The optimally designed dynamic memory networks for targeted sentiment classification , ” Neurocomputing , vol. 309 , pp. 36 - 45 , 2018 .

[17]

Wang ,

Mazumder ,

Liu ,

Zhou , and

Chang , “ Target-sensitive memory networks for aspect sentiment classification,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1 : Long

Papers)

, 2018 .

[18]

Liu and

Shen , “ ReMemNN: A novel memory neural network for powerful interaction in aspect-based sentiment analysis , ” Neurocomputing , vol. 395 , pp. 66 - 77 , 2020 .

[19]

Ma ,

Li ,

Zhang , and

Wang , “ Interactive Attention Networks for Aspect-Level Sentiment Classification,” in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence , 2017 .

[20]

Li ,

Liu , and

Zhou , “ Hierarchical attention based position-aware network for aspect-level sentiment analysis , ” in Proceedings of the 22nd Conference on Computational Natural Language Learning , 2018 .

[21]

Zhou ,

Chen ,

J. X.

Huang ,

Q. V.

Hu , and

He , “ Position-aware hierarchical transfer model for aspect-level sentiment classification,” Inf . Sci. (Ny), vol. 513 , pp. 1 - 16 , 2020 .

[22]

Pontiki ,

Galanis ,

Pavlopoulos ,

Papageorgiou , I. Androutsopoulos , and

Manandhar , “ SemEval -2014 Task 4: Aspect Based Sentiment Analysis,” in Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014 ), 2014 .

[23]

Tan ,

Cai , and

Zhu , “ Recognizing conflict opinions in aspect-level sentiment classification with dual attention networks , ” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , 2019 .

[24]

Xu ,

Li ,

Lu , and L. Bing, “ Position-aware tagging for aspect sentiment triplet extraction ,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2020 .

[25]

Xu ,

Zhang , T. Zhang,

Yu ,

Meng , and

Chen , “ Aspect-level sentiment classification based on attention-BiLSTM model and transfer learning , ” Knowl. Based Syst. , vol. 245 , no. 108586 , p. 108586 , 2022 .

[26]

Zhou and

Liu , “ Filter gate network based on multi-head attention for aspect-level sentiment classification , ” Neurocomputing , vol. 441 , pp. 214 - 225 , 2021 .

[27]

Liu ,

Wang ,

Du ,

Rao , and

Quan , “GSMNet: Global semantic memory network for aspect-level sentiment classification,” IEEE Intell . Syst., vol. 36 , no. 5 , pp. 122 - 130 , 2021 .

[28]

Lin ,

Yang , and

Lai , “ Deep selective memory network with selective attention and interaspect modeling for aspect level sentiment classification,” IEEE ACM Trans . Audio Speech Lang . Process., vol. 29 , pp. 1093 - 1106 , 2021 .

[29]

Huang et al., “ Aspect-level sentiment analysis with aspect-specific context position information , ” Knowl. Based Syst. , vol. 243 , no. 108473 , p. 108473 , 2022 .

[30]

Shao et al., “ Aspect-level sentiment analysis for based on joint aspect and position hierarchy attention mechanism network,”

Intell . Fuzzy Syst., vol. 42 , no. 3 , pp. 2207 - 2218 , 2022 .