=Paper= {{Paper |id=Vol-3304/paper12 |storemode=property |title=Feature Enhanced Dual-GRU for Aspect-based Sentiment Analysis |pdfUrl=https://ceur-ws.org/Vol-3304/paper12.pdf |volume=Vol-3304 |authors=Meng Zhao,Jing Yang,Shuo Wang,Jiaqi Liu }} ==Feature Enhanced Dual-GRU for Aspect-based Sentiment Analysis== https://ceur-ws.org/Vol-3304/paper12.pdf
Feature Enhanced Dual-GRU for Aspect-based Sentiment
Analysis
Meng Zhao, Jing Yang*, Shuo Wang and Jiaqi Liu
Harbin Engineering University, Harbin, Heilongjiang 150001, China

                Abstract
                Aspect-based sentiment analysis (ABSA) aims to predict the sentiment polarity with the
                different aspect terms or categories, which play an important role to guide the representation
                of context vector. Previous studies have used concatenation operation as a common means of
                information aggregation, which increase irrelevant noise and lose the dependence between the
                original features. In this paper, we propose a lightweight feature enhanced dual-GRU to
                selectively learn the feature relevance between aspect terms and context. The dual-GRU
                contains an extended aspect-related GRU and a position-related GRU to generate relevant
                information adaptively. Meanwhile, we construct a context-related GRU to enhance the
                dependency between aspect terms and context. Extensive experimental results demonstrate that
                the proposed model is reliable and effective in improving the performance of the two tasks of
                ABSA.

                Keywords
                Aspect-based sentiment analysis, GRU, Attention networks, Position Information

1. Introduction 1

    Aspect-based sentiment analysis (ABSA) is a significantly more challenging task of fine-grained
sentiment classification towards the specific aspect terms or categories. Concretely, ABSA contains
two subtasks in current research: Aspect-Category Sentiment Analysis (ACSA) and Aspect-Term
Sentiment Analysis (ATSA), their difference is whether the aspect categories (aspect terms) explicitly
occur in the sentence. Each sentence may contain different and additional aspect terms or aspect
categories, which could be a multi-word phrase or a single word. The same aspect may appear in
multiple sentences with different polarities. For example, the sentiment polarity for “staff” in sentence
“But the staff was so horrible to us.” is negative while the sentiment polarity for “staff” in sentence
“The wait staff is very friendly, if not overly efficient.” is positive.
    Various variants of recurrent neural network (RNN) have become increasingly popular for sentiment
analysis. RNN has the ability to model sequence data such as natural language to the extent that each
word can be assumed to be dependent on previous words. Unlike other natural language tasks, the main
challenge of the aspect-based sentiment analysis is how to effectively use the potential representation
of the contextual aspect in different sentences. Previous studies[7][8][12] have reported usually
concatenate the aspect embeddings directly to the context embeddings in order to obtain more context-
sensitive representation. This direct concatenation embeds some irrelated aspect representation, which
in turn affects the context and aspect relationship representation and inevitably affects prediction results.
Recently methods [6][29][30] have confirmed the effectiveness of the position of the aspect terms on
ABSA. The extremely common measure is to concatenate position embeddings directly to the context


ICBASE2022@3rd International Conference on Big Data & Artificial Intelligence & Software Engineering, October 21-
23, 2022, Guangzhou, China
* Corresponding author
EMAIL: zhao_meng@hrbeu.edu.cn (Meng Zhao); yangjing@hrbeu.edu.cn (Jing Yang); tuzi@hrbeu.edu.cn (Shuo Wang);
ljq13@hrbeu.edu.cn (Jiaqi Liu)
ORCID: 0000-0002-9425-9800 (Meng Zhao); 0000-0001-6646-3401 (Jing Yang); 0000-0001-7462-1043 (Shuo Wang); 0000-0001-9697-
2928 (Jiaqi Liu)
             © 2022 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)



                                                                                 92
embeddings, which is simple and effective but prone to context-irrelated noise from position
information.
   To address this problem, inspired by Xing et al. [4], we propose a feature enhanced dual-GRU (FE-
GRU) approach relying on the internal gating mechanism to control the information of the hidden state
learned from input features. It is inappropriate to incorporate the extra information into the hidden state
of GRU[3] (or LSTM[2]). Consequently, we extended different GRU-based variants to guide the
extraction of necessary information, and applied attention mechanisms to highlight the important
sequence information on the dependencies of the sentences.
   The main contributions of our paper are as follows:
   1. We extend different GRU variants, Aspect-related GRU and Position-related GRU, to
        selectively embed the aspect and the position information.
   2. We construct the Context-related GRU to enhance the dependency between aspect terms and
        context.
   3. We present a lightweight feature enhanced dual-GRU (FE-GRU) to guide such related
        information to predict sentiment polarity.

2. Related Works

    Aspect based sentiment analysis [5] has received widespread concern in recent years. Many
approaches have emerged to deal with the classification of sentiments, among which deep-learning
network is current the mainstream approach for sentiment analysis. There are too many neural network
structures based on RNN, especially the LSTM and its variant GRU. Neural networks have the ability
to learn by themselves through their own features and be designed different structures for various tasks.
    Long Short-Term Memory is a modified version of recurrent neural networks to solve the vanishing
gradient problem of RNN. At the same time, attention mechanism [25][26] can avoid the long-term
dependence experienced by the LSTM model. They have been explored in various NLP tasks and have
shown a good performance for NLP tasks. Wang et al. [7] respectively concatenated the aspect
embeddings to the hidden state generated by LSTM and the input word embeddings, and utilized
attention mechanism to capture the key part of sentence. Tang et al. [8] used two LSTMs with target as
the end point to model the preceding and following contexts respectively and concatenated the last
hidden vectors of the two LSTMs. The attention mechanism has the ability to distinguish the importance
of each sequence of information and pays more attention to this particular sequence with the given
aspect. Chen et al. [10] leveraged multiple-attention mechanism to capture sentiment features separated
by a long distance through a Bi-LSTM. Huang et al. [11] introduced an attention-over-attention (AOA)
[13] to generate mutual attentions not only from aspect-to-text but also text-to-aspect.
    Convolutional neural network (CNN) is also a means of extracting sentence features. It is better at
extracting local and position-invariant features than RNN. Xue et al. [9] proposed a gated convolutional
network with aspect embedding (GCAE) based on a CNN and gating mechanisms. Li et al. [14] used
bidirectional long short-term memory (Bi-LSTM) to produce the context information and designed a
target-specific transformation and context-preserving mechanism to learn integrated word
representation and target representation rather than directly concatenating them. Target-specific
Transformation Networks (TNet) finally adopt position-aware convolutional layer instead of vanilla
convolutional layer. Liu et al. [20] proposed a novel neural network framework, namely the Gated
Alternate Neural Network (GANN), which was aimed to enhance the capability of model in capturing
long-distance dependency and modeling sequence information. Compared with traditional RNN, LSTM
and Memory Network [15] has the ability of long-term memory. For aspect-level sentiment analysis.
many improved models [16][17][18][28] based on memory networks have emerged to fit the memory
of the features themselves.

3. Our Framework

   This section presents a novel feature enhanced dual-GRU model for ABSA. The architecture of the
proposed model is illustrated in Fig. 1.


                                                    93
3.1.    Variants of GRU

   Given a sentence 𝑠 the aspect (aspect terms or categories) contains m (m < n) words 𝑒 =
 𝑤 , 𝑤 , … , 𝑤 , the embedding vectors of the given aspect 𝑎 = 𝑎 , 𝑎 , … , 𝑎 , and the embedding
vectors of a sentence 𝑥 = 𝑥 , 𝑥 , … , 𝑥 , the purpose of ABSA is to predict sentiment polarity 𝑦 =
 0,1,2,3 or 𝑦 = 0,1,2 for sentence S, where 0, 1, 2 and 3 denote the “negative”, “neutral”, “positive”
and “conflict” sentiment polarities, respectively.




Figure 1: Architecture of our FE-GRU. Blue indicates the position-related information, orange indicates
the aspect-related information, and green indicates the context-related information.

3.1.1. Aspect-related GRU

    To reduce the noise from aspect-irrelevant, we design the Aspect- related GRU (A-GRU), which is
also a variant of the AA-LSTM [4]. Similarly, we add the aspect-reset gate and aspect-update gate to
control how much aspect information flows into the hidden state. Fig. 2 (a) illustrates the architecture
of the Aspect-related GRU. The core structure of A-GRU are the reset gate and the update gate of aspect
information and input information. Through the core gate structure, the aspect and input information
combine with the previous hidden state to output a new hidden state at each time step. Therefore, the
hidden state can carry aspect-related information throughout the processing of the time sequence.
    At time step t, the aspect-reset gate ra , reset gate rt and the candidate h a . are computed as follows:
                                          ra = σ (Wra [a; hta−1 ] + bra)                                  (1)
                                     rt = σ (Wrt [ xt ; h ] + ra  a + brt )
                                                        a
                                                       t −1                                               (2)
                                        h a = tanh(W [r * h a ; x ])
                                                       ha     t   t −1   t                                (3)
   where h   a
            t −1 denotes the previous hidden state, a is the aspect embedding vectors,    σ is the activation
function. The candidate h a calculated by rt represents the new aspect-related input information at the
current moment. The aspect-update gate za , update gate zt and the new hidden state hta−1 are computed
as follows:
                                     za = σ (Wza [ a; hta−1 ] + bza )                              (4)

                                                       94
                                   zt = σ (Wzt [ xt ; hta−1 ] + za  a + bzt )                        (5)
                                       h a = (1 − z ) * h a + z * h a
                                         t           t        t −1   t                                (6)
   where the aspect-update gate za combine input information with the aspect information to produce
the aspect-related context information, which may contain context-irrelated noise. The aspect-update
gate is used to control how much aspect-related information is brought into the update gate zt while the
update gate zt decides how much aspect-related input information to add and how much aspect-
irrelated input information to throw away.




                (a): Aspect-related GRU                                  (b): Position-related GRU
Figure 2: Architecture of different GRU variants.

    The extended aspect gate allows the GRU to forget certain irrelated parts of the aspect information
and keep previous semantic information simultaneously. We regard certain irrelated parts of the aspect
information and context information as noise, which is a factor influencing the result of predicting the
given aspect’s sentiment polarity. Our extended A-GRU can solve this problem by aspect gates to learn
to reduce the noise in the process of information transfer.

3.1.2. Position-related GRU

   We construct Position-related GRU (P-GRU) to dynamically learn the position information of
aspects in each sentence. Fig. 2 (b) illustrates the architecture of the Position-related GRU. We add a
position gate to control the inflow of position information like aspect gates of A-GRU and leave out the
position update gate. P-GRU can be formalized as follows:
                                        rp = σ (Wrp [ p; htp−1 ] + brp )                             (7)
                                   rt = σ (Wrt [ xt ; htp−1 ] + rp  p + bpt )                        (8)
                                        zt = σ (Wzt [ xt ; hta−1 ] + bzt )                            (9)
                                       h p = tanh(Wha [ rt * htp−1 ; xt ])                          (10)
                                        h p = (1 − z ) * h p + z * h p
                                        t            t        t −1   t                                (11)
   where p is the position embedding vectors, which is invariable through the sequence chain. At time
step t, the position-reset gate rp is used to calculate how much sensitive position information flows into
the hidden state. At the next step, the position information is directly controlled by the update gate rt



                                                         95
of input information rather than the position update gate. rt is to ensure that the position information is
reserved for the aspect words during the learning process of the input information.

3.1.3. Context-related GRU

   Compared to LSTM, GRU has fewer parameters and is easier to calculate when adding aspect and
position gates. We design Context-related GRU to take into account the context-related information of
the sentence, while also associating the aspect with the information of the sentence. Context-related
GRU (C-GRU), not alike A-GRU, add the aspect information directly to hidden state through gate
mechanism.
   Comparatively, C-GRU is a reverse A-GRU, which swaps the contextual sequence input and aspect
input, and adds control gates to ensure the original features of the aspect terms. Max pool layer selects
the maximum value in local features, by this means we can extract the important aspect-related
information. Thus, we introduce Maxout Layer to compress the aspect-aware information and add the
pristine aspect features, which can be obtained as the following:
                                            c = Maxout (htc )                                        (12)
                                                   c′ = β * a + c                                     (13)
   where ht is the output of C-GRU and β is a trade-off parameter.
           c




3.2.    Information fusion

   For ABSA tasks, we explicitly set position index of each aspect words in the sentence to zero, and
define the relative distance to indicate the importance of each word relative to the aspect terms[12][21].
The final context-rich information is expressed as follows:
                                               H = GRU (X )                                           (14)
                                      H a = A-GRU ( H1 , Aspect )                                     (15)
                                         p
                                     H        = P-GRU ( H1 , Position)                                (16)
                                                H final = [ H p ; H a ]                         (17)
   We introduce the attention mechanisms to highlight the dependency between the rich contextual
features and context-related aspect information. The extent of concern is expressed by the weight of
words, which can be expressed as follows:
                                             ci = attention(h pa , c′ )                 (18)
   The final layer we use a softmax layer to output the same number of nodes as the number of
sentiment class:
                                         
                                         yi = softmax(W * ci + b)                                     (19)
   where the softmax operator is used to obtain the probabilities 
                                                                  yi of each class label, W and b are
the model parameters.

3.3.    Objective function

  The final loss function consists of two parts: the Euclidean Loss and the Cross-entropy Loss. For the
C-GRU, we iteratively minimize the squared Euclidean between the original aspect terms and the c′ ,
which is defined as:
                                                            n
                                             d (c ′ , a ) =  (c ′ − a ) 2                            (20)
                                                           i =1
   For the ABSA task, we take the Cross-entropy Loss as the final loss function:


                                                          96
                                                 N
                                        = − ( yi log ( 
                                                         yi )) + d                                     (21)
                                                i =0

    where 
          yi is the predicted sentiment distribution of each class. yi is the true sentiment polarity, N
is the number of all training sentences.

4. Experiments

    In this section, we will introduce four datasets to verify the effectiveness of our model and provide
details of parameter settings. In addition, we extend the ATAE [7] and AOA [11] model and design
different variants of FE-GRU to evaluate an ablation study.

4.1.    Datasets

   We experiment on widely used datasets of SemEval: Restaurant 2014–2016 and Laptop 2014[22]
for ABSA task. Datasets of SemEval consist of laptop and restaurant reviews. We retain the reviews
with sentiment polarity of “conflict” and divide the reviews into four sentiment polarities: positive,
neutral, negative, and conflict because it is unreasonable to set conflict reviews as positive or negative
sentiment polarities or even remove the conflict reviews[23]. The details of the datasets for ABSA task
are shown in Table 1.

Table 1
Statistics of datasets for the ABSA task.
   Tasks       Dataset               Positive    Negative     Neutral         Conflict         Total
                R-14        train     2164         805         633              91             3693
                             test      728         196         196              14             1134
                L-14        train      987         866         460              45             2358
   ATSA                      test      314         128         169              16              654
                R-15        train      905         258          34               -             1197
                             test      324         189          29               -              542
                R-16        train      1227        452          62               -             1741
                             test      461         122          29               -              612
   ACSA         R-14        train     2179         839         633             500             3713
                             test      657         222          94              52             1025

   In our experiments, the pre-trained Glove [1] embedding is adopted as word embeddings in the
datasets and the dimensions of word embeddings, aspect embeddings and position embeddings are set
to 300, the maximum sequence length of a sentence is set to 83. For the out of vocabulary words and
weight matrices, we initialize using a uniform distribution U(-0.25,0.25). Note that we remove the P-
GRU to assess the influence of position information in the ACSA task, because the aspect category
does not appear in the sentence. The batch size is set to {16,32} and the learning rate is set to
{0.003,0.007}.

4.2.    Variants of FE-GRU

   We utilize some baseline approaches to evaluate the effectiveness of the proposed FE-GRU model,
and compare different variants of GRU to verify the effectiveness of the aspect gates and the position
gates.
   • A-GRU is a particularly simple model to control the flow of aspect information via the aspect
       gates.



                                                       97
   •   PA-GRU introduces the position gates and combine the hidden states of P-GRU and A-GRU to
       output the final sentiment class.
   •   ATAE_A replaces the hidden state of the original LSMT with the state of aspect information
       based on ATAE-LSTM.
   •   ATAE_PA adds the position gates to control the extent of position information flowing into the
       hidden state based on ATAE_A.
   •   AOA_A uses the A-GRU instead of the original LSTM and outputs the hidden state with aspect-
       related information.
   •   AOA_PA concatenates the hidden state of A-GRU through the position-related information
       generated by P-GRU.
   •   w/o_GRU removes the A-GRU and P-GRU cells and use the word embeddings directly as input
       to the model.
   •   w/o_P-GRU removes the A-GRU of our FE-GRU.
   •   w/o_A-GRU removes the P-GRU of our FE-GRU.
   •   w/o_C-GRU removes the C-GRU of our FE-GRU.

4.3.    Results and analyses

    The results of experiment comparing the baseline model are shown in Table 2 and Table 3. It's
obviously that the proposed FE-GRU has a better result than the baseline models, because our model
has the ability to selectively learn the dependency information of the aspect terms. ATAE_LSTM and
GCAE in the baseline models have a good classification accuracy partly because the aspect information
is directly used to concatenate the context information. It's a very common choice for various existing
models but lacks the ability to distinguish the irrelevant information. AA_LSTM is designed to influent
the information flow instead of integrating the aspect information into the hidden state vectors.
ATAE_AA_LSTM and IAN_AA_LSTM based on AA_LSTM effectively retain the aspect-related
information compared to the original model. However, AA_LSTM introduces three gates in the LSTM
increasing the training parameters and making the training process more difficult.
    It is proved that the optimized gates can control the extent of information flowing into the hidden
state. It is also not appropriate to combine position information directly into context in the existing
models. In addition, the designed C-GRU associates aspect information into contexts as the clue to
guide aspect features, which aims to enhance contextual dependencies.

Table 2
Experimental results of baseline models in ATSA task. Bold indicates the best result.
      Models                 R-14               L-14                R-15                 R-16
                        Acc.      F1       Acc.      F1                   Acc.        F1      Acc.
TD_LSTM[15]            73.19     51.66    62.38        45.42   69.01      46.75     79.07      52.54
ATAE_LSTM[7]           77.60     53.51    68.80        52.99   77.38      53.27     84.92      54.50
MemNet[15]             72.04     46.63    69.13        55.32   72.32      49.13     82.35      58.68
IAN[19]                76.63     50.58    68.04        47.07   75.46      50.34     82.42      52.91
RAM[10]                77.07     59.25    68.96        54.60   76.88      52.83     86.30      55.59
AOA_LSTM[11]           77.25     50.38    69.11        47.92   78.96      56.55     85.62      55.49
GCAE[9]                77.68     55.96    69.26        48.20   77.86      52.18     86.27      55.25
ATAE_AA_LSTM[4]        78.21     53.05    70.03        48.67   79.56      53.56     86.30      68.83
IAN_AA_LSTM[4]         77.95     57.00    69.72        49.03   77.38      51.60     85.23      56.87
FE-GRU                 79.54     60.89    71.12        55.96   79.56      59.26     86.92      61.47




                                                  98
Table 3
Experimental results of different variant models in ACSA task. Note that FE-GRU* does not contain P-
GRU.
         ACSA Models                                             R-14
                                                 Acc.                             F1
 A-GRU                                          78.24                           58.47
 ATAE_LSTM                                      81.17                           63.98
 GCAE                                           80.58                           60.73
 ATAE\_A                                        80.97                           61.80
 ATAE\_AA\_LSTM                                 81.85                           64.27
 FE-van-GRU                                     80.87                           62.68
 FE-A-GRU                                       82.06                           64.84
 FE-GRU*                                        82.24                           65.45

   We chose the simple A-GRU and PA-GRU models to compare to prove the significant contribution
of position information in sentiment classification, the experimental results are shown in Table 4. The
results of the comparative experiments show that P-GRU is sensitive to the specific position of aspect
terms in the sentence, which is learned and determined by P-GRU.

Table 4
Experimental results of different variant models in ATSA task. The bold indicates that the expanded
model has achieved better results than the original.
 Extended Models             R-14               L-14                R-15                 R-16
                        Acc.       F1      Acc.       F1                  Acc.      F1        Acc.
 A-GRU                 74.87     49.53    63.76      38.26    75.54      52.15     82.61     53.83
 PA-GRU                75.37     51.63    67.12      46.62    75.71      49.66     83.07     57.70
 ATAE-A                77.16     50.14    68.34      46.61    77.25      49.79     84.61     53.00
 ATAE-PA               79.10 60.41 70.94             49.69   78.72       52.86     85.29     59.98
 AOA-A                 77.16     47.88 70.18         48.97    78.55      52.84     85.38     54.17
 AOA-PA                77.78 56.58        69.57      48.09   79.06       56.24     85.84     54.84
 w/o_GRU               76.45     46.51    66.36      44.41    76.63      48.10     84.00     56.82
 w/o_P-GRU             77.60     50.10    68.96      47.88    77.21      53.24     84.46     57.30
 w/o_A-GRU             78.48     60.85    69.41      48.90    79.73      55.07     86.15     58.93
 w/o_C-GRU             78.92     60.22    70.33      49.31    79.89      56.80     86.61     59.46

    In addition, the experimental effect of the ATAE_LSTM model with position information is much
better than the original model, which concatenates the aspect embedding and the hidden outputs of
LSTM to generate attention weights. We change the hidden outputs of LSTM with the aspect-related
context information and concatenate the position-related context information to calculate the weight of
each sequence. The only difference between ATAE_A and ATAE_PA is that the latter has the ability
to learn position information, and the experiment of two groups of ATAE demonstrate the reliability of
learning position information through gate mechanisms. We use the hidden state of PA-GRU to replace
the original LSTM output, which implies that the position-related information and the aspect-related
information of the AOA model can improve the effect of sentiment classification.

5. Conclusions

   In this paper, we have proposed a feature enhanced dual-GRU (FE-GRU) for ABSA task. The
purpose of the proposed FE-GRU is to improve the validity and correlation of the context and the aspect
information as much as possible. We designed different feature embedding strategies in the lightweight
GRU to associate and enhance the specific information dependencies, which is a novel approach to


                                                  99
aggregating information instead of the single concatenation operation. We conducted massive
experiments with extensive models on the ACSA and ATSA tasks and achieved significant
improvements on most datasets. The experimental results have proved the rationality and effectiveness
of the proposed approach.

6. Acknowledgements

   This paper is supported by the National Natural Science Foundation of China under Grant
nos.61672179, 61370083.

7. References

[1] J. Pennington, R. Socher, and C. Manning, “Glove: Global vectors for word representation,” in
     Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing
     (EMNLP), 2014.
[2] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp.
     1735–1780, 1997.
[3] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural
     networks on sequence modeling,” arXiv [cs.NE], 2014.
[4] B. Xing et al., “Earlier attention? Aspect-aware LSTM for aspect-based sentiment analysis,” in
     Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019.
[5] B. Keith Norambuena, E. F. Lettura, and C. M. Villegas, “Sentiment analysis and opinion mining
     applied to scientific paper reviews,” Intell. Data Anal., vol. 23, no. 1, pp. 191–214, 2019.
[6] M. Yang, Q. Jiang, Y. Shen, Q. Wu, Z. Zhao, and W. Zhou, “Hierarchical human-like strategy for
     aspect-level sentiment classification with sentiment linguistic knowledge and reinforcement
     learning,” Neural Netw., vol. 117, pp. 240–248, 2019.
[7] Y. Wang, M. Huang, X. Zhu, and L. Zhao, “Attention-based LSTM for Aspect-level Sentiment
     Classification,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language
     Processing, 2016.
[8] D. Tang, B. Qin, X. Feng, and T. Liu, “Effective LSTMs for Target-Dependent Sentiment
     Classification,” arXiv [cs.CL], 2015.
[9] W. Xue and T. Li, “Aspect based sentiment analysis with gated convolutional networks,” in
     Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018.
[10] P. Chen, Z. Sun, L. Bing, and W. Yang, “Recurrent attention network on memory for aspect
     sentiment analysis,” in Proceedings of the 2017 Conference on Empirical Methods in Natural
     Language Processing, 2017.
[11] B. Huang, Y. Ou, and K. M. Carley, “Aspect level sentiment classification with attention-over-
     attention neural networks,” in Social, Cultural, and Behavioral Modeling, Cham: Springer
     International Publishing, 2018, pp. 197–206.
[12] N. Liu and B. Shen, “Aspect-based sentiment analysis with gated alternate neural network,” Knowl.
     Based Syst., vol. 188, no. 105010, p. 105010, 2020.
[13] Y. Cui, Z. Chen, S. Wei, S. Wang, T. Liu, and G. Hu, “Attention-over-attention neural networks
     for reading comprehension,” in Proceedings of the 55th Annual Meeting of the Association for
     Computational Linguistics, 2017.
[14] X. Li, L. Bing, W. Lam, and B. Shi, “Transformation networks for target-oriented sentiment
     classification,” in Proceedings of the 56th Annual Meeting of the Association for Computational
     Linguistics (Volume 1: Long Papers), 2018.
[15] D. Tang, B. Qin, and T. Liu, “Aspect level sentiment classification with deep memory network,”
     in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing,
     2016.
[16] Z. Zhang, L. Wang, Y. Zou, and C. Gan, “The optimally designed dynamic memory networks for
     targeted sentiment classification,” Neurocomputing, vol. 309, pp. 36–45, 2018.



                                                 100
[17] S. Wang, S. Mazumder, B. Liu, M. Zhou, and Y. Chang, “Target-sensitive memory networks for
     aspect sentiment classification,” in Proceedings of the 56th Annual Meeting of the Association for
     Computational Linguistics (Volume 1: Long Papers), 2018.
[18] N. Liu and B. Shen, “ReMemNN: A novel memory neural network for powerful interaction in
     aspect-based sentiment analysis,” Neurocomputing, vol. 395, pp. 66–77, 2020.
[19] D. Ma, S. Li, X. Zhang, and H. Wang, “Interactive Attention Networks for Aspect-Level Sentiment
     Classification,” in Proceedings of the Twenty-Sixth International Joint Conference on Artificial
     Intelligence, 2017.
[20] L. Li, Y. Liu, and A. Zhou, “Hierarchical attention based position-aware network for aspect-level
     sentiment analysis,” in Proceedings of the 22nd Conference on Computational Natural Language
     Learning, 2018.
[21] J. Zhou, Q. Chen, J. X. Huang, Q. V. Hu, and L. He, “Position-aware hierarchical transfer model
     for aspect-level sentiment classification,” Inf. Sci. (Ny), vol. 513, pp. 1–16, 2020.
[22] M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos, and S. Manandhar,
     “SemEval-2014 Task 4: Aspect Based Sentiment Analysis,” in Proceedings of the 8th International
     Workshop on Semantic Evaluation (SemEval 2014), 2014.
[23] X. Tan, Y. Cai, and C. Zhu, “Recognizing conflict opinions in aspect-level sentiment classification
     with dual attention networks,” in Proceedings of the 2019 Conference on Empirical Methods in
     Natural Language Processing and the 9th International Joint Conference on Natural Language
     Processing (EMNLP-IJCNLP), 2019.
[24] L. Xu, H. Li, W. Lu, and L. Bing, “Position-aware tagging for aspect sentiment triplet extraction,”
     in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing
     (EMNLP), 2020.
[25] G. Xu, Z. Zhang, T. Zhang, S. Yu, Y. Meng, and S. Chen, “Aspect-level sentiment classification
     based on attention-BiLSTM model and transfer learning,” Knowl. Based Syst., vol. 245, no.
     108586, p. 108586, 2022.
[26] Z. Zhou and F. Liu, “Filter gate network based on multi-head attention for aspect-level sentiment
     classification,” Neurocomputing, vol. 441, pp. 214–225, 2021.
[27] Z. Liu, J. Wang, X. Du, Y. Rao, and X. Quan, “GSMNet: Global semantic memory network for
     aspect-level sentiment classification,” IEEE Intell. Syst., vol. 36, no. 5, pp. 122–130, 2021.
[28] P. Lin, M. Yang, and J. Lai, “Deep selective memory network with selective attention and inter-
     aspect modeling for aspect level sentiment classification,” IEEE ACM Trans. Audio Speech Lang.
     Process., vol. 29, pp. 1093–1106, 2021.
[29] B. Huang et al., “Aspect-level sentiment analysis with aspect-specific context position information,”
     Knowl. Based Syst., vol. 243, no. 108473, p. 108473, 2022.
[30] D. Shao et al., “Aspect-level sentiment analysis for based on joint aspect and position hierarchy
     attention mechanism network,” J. Intell. Fuzzy Syst., vol. 42, no. 3, pp. 2207–2218, 2022.




                                                  101