=Paper= {{Paper |id=Vol-2960/paper15 |storemode=property |title=A General Aspect-Term-Extraction Model for Multi-Criteria Recommendations (Long paper) |pdfUrl=https://ceur-ws.org/Vol-2960/paper15.pdf |volume=Vol-2960 |authors=Paolo Pastore,Andrea Iovine,Fedelucio Narducci,Giovanni Semeraro |dblpUrl=https://dblp.org/rec/conf/recsys/PastoreINS21 }} ==A General Aspect-Term-Extraction Model for Multi-Criteria Recommendations (Long paper)== https://ceur-ws.org/Vol-2960/paper15.pdf
A General Aspect-Term-Extraction Model for Multi-Criteria
Recommendations
Paolo Pastore1 , Andrea Iovine2 , Fedelucio Narducci1 and Giovanni Semeraro2
1
    Polytechnic University of Bari, Italy
2
    Dept. of Computer Science University of Bari, Italy


                                             Abstract
                                             In recent years, increasingly large quantities of user reviews have been made available by several e-commerce platforms.
                                             This content is very useful for recommender systems (RSs), since it reflects the users’ opinion of the items regarding several
                                             aspects. In fact, they are especially valuable for RSs that are able to exploit multi-faceted user ratings. However, extracting
                                             aspect-based ratings from unstructured text is not a trivial task. Deep Learning models for aspect extraction have proven
                                             to be effective, but they need to be trained on large quantities of domain-specific data, which are not always available. In
                                             this paper, we explore the possibility of transferring knowledge across domains for automatically extracting aspects from
                                             user reviews, and its implications in terms of recommendation accuracy. We performed different experiments with several
                                             Deep Learning-based Aspect Term Extraction (ATE) techniques and Multi-Criteria recommendation algorithms. Results
                                             show that our framework is able to improve recommendation accuracy compared to several baselines based on single-criteria
                                             recommendation, despite the fact that no labeled data in the target domain was used when training the ATE model.

                                             Keywords
                                             multi-criteria recommendation, deep learning, aspect term extraction, domain adaptation, transfer learning



1. Introduction                                                                                                       both aspects and ratings must be extracted automatically
                                                                                                                      from unstructured text. This task is usually referred to
Nowadays, many Web platforms and e-commerce web-                                                                      as Aspect-Based Sentiment Analysis (ABSA). ABSA is not
sites allow customers to express their opinions by pro-                                                               a trivial task, because there is no stable definition of ”as-
viding reviews on items, services, or media. Such user-                                                               pect”, due to its intrinsic subjectivity. Also, the same
generated content is extremely valuable for recommen-                                                                 aspect can appear in many different forms inside user
dation, since it reflects the user’s perception of a spe-                                                             reviews. For instance, a reviewer could use ”service”,
cific item and of specific features of that item listing                                                              ”staff” or ”waiter” for referring to the ”service” category.
its strengths and weaknesses, the most important fea-                                                                 For this reason, we distinguish between the aspect itself
tures, and the tasks for which it is more (or less) suitable.                                                         and its representation forms in the reviews, also called
Extracting this information and exploiting it to enrich                                                               aspect terms. Furthermore, the aspects used in a domain
user profiles and item descriptions can give enormous                                                                 are completely different to those in other domains: for
advantages to Recommender Systems (RSs). Given the                                                                    restaurants, users will mention features such as the food
considerable importance of reviews in the recommen-                                                                   or the quality of the service, when talking about smart-
dation process, many works in the literature proposed                                                                 phones, they will instead refer to other aspects such as
the idea of integrating them into RSs, as a way to im-                                                                the screen or the camera. In recent years, many models
prove their accuracy. Specifically, text reviews can be a                                                             for automatically extracting aspects from text based on
solution to the rating sparsity problem often encountered                                                             Deep Learning models have been proposed. However,
by RSs based on Collaborative Filtering (CF), and can                                                                 these techniques need to be trained on domain-specific
be used to capture a much more fine-grained model of                                                                  labeled datasets that are not always available.
the customer’s preferences [1]. Accordingly, instead of                                                                  In this paper, we investigate the application of domain
modeling the user’s profile as a set of (item, rating) pairs,                                                         adaptation strategies for aspect-based recommendation.
it might be represented as a set of (item, aspect, rating)                                                            The aim is to evaluate the effectiveness of modern Deep
triples. Of course, the problem with this approach is that                                                            Learning-based Aspect Term Extraction (ATE) models
                                                                                                                      when no annotated data is available for the target do-
3rd Edition of Knowledge-aware and Conversational Recommender                                                         main. For this purpose, we developed an aspect-based
Systems (KaRS) & 5th Edition of Recommendation in Complex
Environments (ComplexRec) Joint Workshop @ RecSys 2021,
                                                                                                                      recommendation framework that includes an ATE mod-
September 27–1 October 2021, Amsterdam, Netherlands                                                                   ule, an Aspect Clustering module, a Sentiment Analysis
Envelope-Open paolo.pastore1@poliba.it (P. Pastore); andrea.iovine@uniba.it                                           (SA) module, and a Multi-Criteria Recommender Sys-
(A. Iovine); fedelucio.narducci@poliba.it (F. Narducci);                                                              tem. We performed an experimental study to compare
giovanni.semeraro@uniba.it (G. Semeraro)                                                                              several ATE models both in a single domain scenario
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative
                                       Commons License Attribution 4.0 International (CC BY 4.0).                     and in a domain adaptation setting. We then chose the
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
model that obtained the best performance in both set-          as recommendation algorithms. Our work follows a simi-
tings, i.e. the model that is most able to capture the es-     lar approach. In our framework however, the ATE task is
sential, domain-invariant characteristics of aspect terms.     performed using state-of-the-art Deep Learning models.
Finally, we tested the framework in a recommendation              ABSA has proven to be a very effective method for
scenario, to understand whether the models involved in         improving the accuracy, usefulness and persuasiveness
this study actually improve the accuracy of RSs, com-          of the recommendations. As a result, Natural Language
pared to single-criteria recommendation baselines. This        Processing (NLP) research focused on improving ABSA
will prove that our framework is able to successfully ex-      and ATE models, and more resources have been made
tract fine-grained ratings from text, and exploit them for     available for these tasks. Examples of such resources are
improving the quality of the recommendations.                  the SemEval datasets [12, 13, 14], and Hu and Liu [15].
   In summary, the main contributions of this work are:           Earlier works on ATE proposed strategies such as as-
(a) The definition of a novel framework for aspect-based       sociation rule mining [15], Conditional Random Fields
recommendation, that can automatically extract aspect-         (CRF) [16], knowledge-based topic modeling [17], or dou-
based ratings from unstructured text (i.e. reviews) inde-      ble propagation [18, 19]. In recent years, the success of
pendently from the domain, using Deep Learning models;         Deep Learning models in Natural Language Processing
(b) An evaluation of the performance of Deep Learning-         tasks meant that research focus has moved towards us-
based ATE models in a domain adaptation setting (i.e.          ing neural networks for ATE. Pavlopoulos and Androut-
when no annotated data in the target domain is available);     sopoulos [20] improved the method described in [15] by
(c) An evaluation of the performance of our framework,         using word embeddings generated via Word2Vec. Poria
compared to a set of single-criteria recommendation base-      et al. [21] used Convolutional Neural Networks (CNNs)
lines, in terms of rating prediction accuracy.                 and several word embedding strategies. Giannakopou-
                                                               los et al. [22] developed a model for both supervised
                                                               and unsupervised ATE in large review datasets, based
2. Related work                                                on Bi-Directional Long-Short Term Memory (Bi-LSTM)
                                                               networks and CRF. Li and Lam [23] propose a multi-
A great amount of work has been dedicated to research-
                                                               task learning framework for ATE and sentiment analysis
ing techniques for enhancing RSs by using data extracted
                                                               based on LSTMs. Li et al. [24] use aspect detection his-
from reviews. Chen et al. [1] and He et al. [2] contain a
                                                               tory and opinion summary to enhance the ATE model.
review of the state of the art of review-aware RSs. There
                                                               Some works investigate the addition of dependency re-
are three main types of approaches: Word-based, that
                                                               lationships in order to improve the accuracy of neural
consists of directly using words found in the review as
                                                               network-based models, such as Ye et al. [25] and Luo et
the user profile; Sentiment-based, that aims to extract
                                                               al. [26].
the user’s overall rating of an item via Sentiment Analy-
                                                                  Finally, some works are focused on developing ATE
sis; Aspect-based, that exploits multi-faceted ratings from
                                                               methods that can generalize over different domains, us-
reviews.Our work is strictly focused on aspect-based rec-
                                                               ing transfer learning or domain adaptation approaches.
ommendation, extracting explicit factors from text re-
                                                               An early example is Jakob and Gurevych [16], which
views rather than latent factors (such as in [3, 4, 5]). The
                                                               used a CRF-based approach. Ding et al. [27] use RNNs
main advantage is that aspects can be also useful outside
                                                               combined with rule-based auxiliary labels. Wang and
recommendation, e.g. for explanation.
                                                               Pan [28] incorporate dependency tree information us-
   Many works employ strategies such as topic modeling
                                                               ing Recursive Neural Networks for both Aspect Term
[6], sentiment lexicons [7], or rule-based systems [2, 8]
                                                               Extraction and Opinion Target Extraction tasks in or-
in order to extract aspect-based ratings from reviews for
                                                               der to transfer information between domains. Later, in
recommendation purposes. The experiments performed
                                                               [29] they introduce Transferable Interactive Memory Net-
in these works prove that aspect-based ratings can indeed
                                                               works (TIMN) that can effectively model a representation
improve recommendation accuracy over single-criteria
                                                               for aspect terms across domains. Marcacini et al. [30] use
baselines. In our work, we plan to instead perform the
                                                               transductive learning to map linguistic features of source
ATE task by using techniques based on Deep Learning.
                                                               and target domains in a heterogeneous network. Lee et al.
   In Musto et al. [9], ABSA is applied to a Multi-Criteria
                                                               [31] propose a transfer learning approach for ATE that
RS for the restaurant recommendation scenario using
                                                               is based on sequentially fine-tuning pre-trained features
a tool called SABRE [10], which is able to extract rele-
                                                               over different product groups. Pereg et al. [32] investi-
vant aspects from review text using the Kullback-Leibler
                                                               gate the introduction of external syntactic features into a
divergence [11], as well as the rating assigned to each
                                                               BERT-based model in order to exploit structural similari-
aspect. Aspects can also be organized into sub-aspects to
                                                               ties of aspects across domains. Liang et al. [33] exploit
obtain fine-grained information. Multi-criteria User-to-
                                                               the correlation between coarse-grained aspect categories
User and Item-to-Item CF algorithms were both proposed
                                                               and fine-grained aspect terms via a multi-level recon-
struction mechanism. In our work, we not only evaluate        Analysis modules are used to compose the aspect-based
the performance of several ATE approaches in a domain         item ratings, which are organized into a 3-dimesional
adaptation setting, but we also assess their effectiveness    tensor (i.e. a tensor in which the first dimension repre-
in improving the accuracy of the recommendations.             sents the users, the second represents the items, and the
   Recently, Da’u et al. [34] investigated the application    third represents the aspect clusters) which is then passed
of Deep Learning aspect extraction models for recommen-       to the Multi-Criteria recommendation algorithm. More
dation. While this work has the same premise as ours,         details on this component are discussed in Section 3.3.
there are two major differences: first, the architecture         Figure 1 shows an example of execution of our frame-
used is based on CNNs, while we included several con-         work. Each review is split into atomic sentences, and
figurations based on residual LSTM and BERT. Second,          then each sentence is given as input to both the ATE
their work relies on the presence of annotated ATE data       module and the SA module, in order to extract both as-
for the target domain, and does not deal with domain          pect terms and ratings. In the example, starting from the
adaptation.                                                   sentence ”As always we had a great glass of wine while
   Based on the analysis of the literature, we have identi-   we waited”, the ATE module extracts the ”glass of wine”
fied a gap in the literature. In fact, the papers mentioned   aspect term, and the SA module assigns a positive rating
above either describe domain adaptation strategies for        to it. The extracted aspect is then given as input to the As-
ATE, or employ ATE for recommendation purposes. To            pect Clustering module, that assigns it to the right cluster,
the best of our knowledge, none combine the two ideas         i.e. Beverage. The cluster information and the predicted
together, by explicitly measuring the impact of domain        sentiments are used to generate the aspect-based ratings
adaptation on the quality of the recommendations. We          tensor. The Recommendation Algorithm takes this ten-
believe that this is very important, especially due to the    sor as input for generating a list of recommendations.
extreme scarcity of annotated datasets for training ATE
systems, which hinders their applicability to the recom-
mendation scenario.                                           3.1. Aspect Term Extraction
                                                              This section is focused on describing the ATE compo-
3. Aspect-based recommendation                                nent of the framework. ATE is one of the sub-tasks of
   framework                                                  ABSA [14].
                                                                 Most approaches treat the task of extracting relevant
In this section, we describe a novel review-aware aspect-     aspects as a sequence labeling problem [21], in which the
based recommendation framework that has been created          review is first tokenized, and then each token is classified
for the purposes of this study. We exploit user reviews       as either being an aspect term or not. A classifier can be
in order to go beyond item ratings, by extracting richer      trained by supplying supervised data, i.e. pre-annotated
aspect-based evaluations. The main advantage of this          reviews. The standard schema for annotating reviews is
framework is that it lets us discover new aspects directly    the BIO tagging. According to this schema, three distinct
from user reviews. Additionally, the aspect-based item        labels can be associated to each token: B means that
ratings enrich the user profile, as they let us understand    the token represents the beginning of an aspect term, I
which aspects users care more about. Finally, they allow      means that it represents the continuation of an aspect
us to identify the individual strengths and weaknesses of     term, while O means that it is not an aspect term. This
each item from the user’s point of view.                      schema is shared with other sequence labeling tasks, such
   The proposed architecture is composed by several sub-      as Named Entity Recognition (NER).
modules as shown in the example in Figure 1. The first           Figure 2 shows the architecture of the ATE module.
one is the ATE module which is in charge of identifying       For this task, we focused on techniques based on Deep
aspects mentioned in the user reviews, by extracting the      Learning, which have proven to be the most promising in
corresponding aspect terms from the review text. The          the state of the art. In our study, we focused on the well
framework supports several ATE approaches, which will         known BERT model and on the residual Bi-LSTM. BERT
be detailed in Section 3.1.                                   is one of the most recent pre-trained frameworks for NLP
   The second component is the Aspect Clustering mod-         and it can be exploited for many tasks, including NER and
ule, whose role is to group aspect terms that express         ATE. The residual Bi-LSTM is a variant of the classical Bi-
similar concepts together into aspects. The Sentiment         directional LSTM which was successfully used in other
Analysis module works in parallel with the previous two.      sequence labeling tasks such as Tran et al. [35]. It is
Its role is to extract the user’s sentiment from the review   composed of two stacked Bi-LSTM layers, where the sum
in order to assign a score to each aspect term. Details on    of the output of the first and second layer is sent to the
this step will be discussed in Section 3.2.                   final softmax layer, instead of sending only the output
   The outputs of the Aspect Clustering and Sentiment         of the second layer. Different embedding strategies have
Figure 1: Example of recommendation process



been used in order to encode the tokens into real-valued       the embeddings generated by ELMo are deeply contextu-
vectors. In particular, we aim to use the ability to capture   alized, and are more capable of handling polisemy. In this
a contextual representation of words to learn a model          configuration, the architecture is defined as follows: an
that is independent from the domain, i.e. that is able to      ELMo embedding layer is used, followed by the residual
extract aspect terms from reviews of any domain. In this       Bi-LSTM layers described in the previous configurations.
way, we can exploit a model trained on a given domain             BERT. For this configuration, we employed BERT, in-
to extract aspect terms from another, unseen domain.           troduced in Devlin et al. [39], which has been successfully
Hence, the definition of domain adaptation.                    applied in a variety of NLP tasks such as NER and text
   The following is a list of all the ATE approaches that      classification. Specifically, we employed a pre-trained
are included in the evaluation.                                BERT model available from the PyTorch library3 . This
   Pre-trained Word2Vec-Residual LSTM. Word2Vec                model is then fine-tuned, i.e. its parameters are updated
is one of the first successful word embedding techniques,      by training it on the ATE task. The NN architecture
introduced in Mikolov et al. [36]. For this configuration,     used by BERT is a multi-layer bidirectional Transformer
we employed embeddings that were previously trained            encoder, as described in [39].
from a part of the Google News datasets1 . The neural
network architecture used in this configuration is the         3.2. Aspect Term Clustering and
Residual Bi-directional Long-Short Term Memory (LSTM)
described earlier.
                                                                    Sentiment Analysis
   Pre-trained GloVe-Residual LSTM. For this ap-               As stated in the Introduction, one of the main problems
proach, we used a set of pre-trained embeddings from           of extracting aspect-based ratings from reviews is that
GloVe. GloVe is a model for distributed word representa-       users may refer to the same aspect in many different
tion, introduced in Pennington et al. [37]. It is developed    forms. Therefore, a strategy for grouping together all
as an open-source project at Stanford University, and          aspect forms that refer to the same concept is needed. We
the pre-trained embeddings are publicly available2 . The       propose to group aspect terms together based on their
neural network architecture used is the Residual LSTM,         Word2Vec representation. In the case of multi-word as-
like in the previous configurations.                           pect terms, we calculated the average of the embeddings
   ELMo embeddings-Residual LSTM. ELMo (Peters                 of each word. We then perform a clustering task by using
et al. [38]) stands for Embeddings from Language Models,       the K-means algorithm. This allows us to automatically
and is a novel contextualized embedding strategy. That         group aspect terms into aspect categories in an unsuper-
is, instead of using a single vector for each word in the      vised way.
dictionary, ELMo looks at the entire sentence before as-          We then used the VADER sentiment analysis model of-
signing each word in it its embedding. The result is that      fered by the NLTK library4 to obtain the rating assigned
                                                               to each aspect term in the review. Each review is split
    1
      https://code.google.com/archive/p/                       into atomic sentences, which are fed to the sentiment
word2vec/?fbclid=IwAR3poHsG_4PZdqfbR_                          analyzer in order to predict their polarity. We then use
JESidu9WLMf44ffd0A8ZFmrxCPiKTDghc5hQCLUeQ                      this sentiment to assign a score to all the aspect terms
    2
      https://nlp.stanford.edu/projects/glove/?fbclid=
IwAR3JafEUyzBT5kwgdKHcQH20nQeTzG1NZs2_                            3
                                                                      https://pypi.org/project/pytorch-pretrained-bert/
                                                                  4
BHAhuOgaluO0HC7P5WW6EC8                                               https://www.nltk.org/
Figure 2: Execution of the ATE task with the residual Bi-LSTM and BERT



appearing in that sentence. The final output is the trans-      gular Value Decomposition (SVD), which is a matrix fac-
formation of each review into a set of (user, item, aspect,     torization technique. More details about the SVD tech-
rating) tuples. This information will be the input to the       nique can be found in Koren et al. [41]. This technique
Multi-Criteria RS.                                              was originally developed for single-criteria RSs. In or-
                                                                der to extend it to a multi-criteria scenario, we used a
3.3. Aspect-Based Multi-Criteria                                naive aggregation function-based approach [40, 42]: we
                                                                divided the k-dimensional multi-criteria recommenda-
     recommendation                                             tion task into a set of 𝑘 single-criteria tasks. This means
Once the proposed framework has extracted all aspect-           that we trained 𝑘 SVD models, one for each aspect 𝑎𝑐 , for
based ratings from the reviews, the last step is the recom-     𝑐 ∈ {1, ..., 𝑘}. Each model predicts the rating for a spe-
mendation. Recommendations are generated via a multi-           cific aspect 𝑟𝑎𝑐 (𝑢, 𝑖). In order to predict the overall rating
criteria algorithm based on collaborative filtering [40].       𝑟(𝑢, 𝑖) for a given user 𝑢 and an item 𝑖, we calculate an
For this purpose, we treated the sentiments extracted           aggregate function: 𝑟(𝑢, 𝑖) = 𝑓 (𝑟𝑎1 (𝑢, 𝑖), ..., 𝑟𝑎𝑘 (𝑢, 𝑖)). In our
by our framework as the ratings given by the user to            case, the aggregate function is a simple average of the
the item for each aspect. For each aspect that was not          aspect-based ratings.
mentioned in the user review, we decided to assign the
item’s overall rating. This choice was made empirically,
as it improved the performance of the recommendation            4. Evaluation
algorithm. The rest of this section contains a description      This section describes the in-vitro experiment that we set
of the recommendation algorithms.                               up to evaluate the performance of our framework. The ex-
   User-to-User Multi-Criteria CF: This is an exten-            periment is divided into two parts. First, we evaluate the
sion of the similarity-based approaches for CF. The dis-        ATE models that were described in Section 3.1, in order
tance 𝑑(𝑢𝑗 , 𝑢𝑘 ) between users 𝑢𝑗 and 𝑢𝑘 is calculated using   to determine which one has the best performance when
a multi-criteria distance function that takes the ratings       trained in a domain adaptation scenario. The second step
given to each aspect into account (Equation 13 in [40]).        of the experiment is the recommendation test: we extract
For a new user-item pair, we generate a neighborhood            aspect-based ratings from a dataset of restaurant reviews
of top-n most similar users, and then we calculate the          using the best ATE model from the previous test, and
predicted overall rating using the adjusted weighted sum        then we evaluate each of the multi-criteria recommen-
of the neighbor’s ratings (Equation 3 in [40]).                 dation approaches discussed in Section 3.3 in terms of
   Item-to-Item Multi-Criteria CF: This is the multi-           their rating prediction accuracy. These approaches will
criteria equivalent of the item-based CF technique. As          also be compared to several baselines. This experiment
for the previous technique, the distance 𝑑(𝑖𝑗 , 𝑖𝑘 ) between    will assess whether the multi-criteria recommendations
items is calculated using a multi-criteria distance function    generated by our framework are more accurate than the
(Equation 5 in [9]). For any given user-item pair, we           ones obtained by using single-criteria ratings.
generate a neighborhood of the top-n most similar items.
The overall predicted rating is calculated using the item-
based equivalent of the adjusted weighted sum approach          4.1. Evaluation of the ATE approaches
found in [40].                                                  We collected six datasets for the ATE task from the lit-
   Multi-Criteria SVD: This approach is based on Sin-           erature, three of which come from the SemEval ABSA
Table 1                                                          not being the smallest dataset, all approaches performed
Description of the datasets                                      especially poorly on it.
              Dataset               #Sentences   #Aspect terms      In the domain adaptation test, ELMo outperforms the
 Restaurants (SemEval 2014-15-16)      7841          8183        other three models in five out of six datasets. We also
      Laptops (SemEval 2014)           3845          2918
                                                                 compare the scores obtained from the single domain and
       Hotels (SemEval 2015)           266            213
       Computers (Liu et al.)          531            363        domain transfer tests. In the largest datasets, we can
        Speakers (Liu et al.)          689            454        observe that the latter induces a substantial loss in F1
         Routers (Liu et al.)          879            325
                                                                 compared to the former: around 28% in the Restaurants
                                                                 domain, and around 47% in the Laptops domain. This
                                                                 loss can be attributed to the lack of domain-specific data
challenges with reviews about restaurants, laptops and           in the respective domains. In the smaller datasets such as
hotels [12, 13, 14], while the other three are found in Liu      Hotels, the loss is either very small, or nonexistent. Simi-
et al. [18] and contain reviews about computers, speakers        lar observations can be made for the BERT approach in
and routers. Table 1 reports the number of sentences and         the larger datasets. In the smaller datasets however, the
aspect terms contained in each dataset.                          domain transfer configuration actually outperforms the
   A single domain study was conducted by training and           single domain one. This gives more credibility to the hy-
testing each ATE model on the same dataset. Train-               pothesis that BERT is more susceptible to training set size
test split was performed via 5-fold cross validation. The        compared to ELMo. The GloVe and Word2Vec approaches
metrics used to evaluate the performance are Precision,          show much larger losses. This is a clear indication that
Recall, and F1-score. An aspect term was considered              they are less capable of transferring knowledge on the
correctly recognized if all the tokens that compose it           ATE task from one domain to another.
were correctly tagged by the system. Therefore, partial             Based on the results from this Section, we can say
matches were not considered in the evaluation. For each          with enough confidence that ELMo is the approach that
configuration, we calculated the overall score by averag-        obtained the best performance in the ATE task. Not
ing the metrics obtained for each fold.                          only it outperformed the other three approaches in the
   In addition to the single domain study, we performed a        single domain setting, but it is also demonstrated a good
domain adaptation experiment, which tests each model’s           ability to transfer the aspect extraction task over different
ability to generalize the ATE task onto a new, unseen            domains. For this reason, we chose this approach as part
domain. We performed six tests, one for each dataset.            of the ATE component of our framework.
In each test, we used one dataset as the test set, and all
remaining datasets as the training and development set,
using a random 80-20 split.                                      4.2. Evaluation of the Recommender
   Table 2 describes the results of experiments. Single               System
refers to the single domain tests, while DA refers to the        We performed an experiment to measure our frame-
domain adaptation tests. We report the Precision, Recall         work’s recommendation accuracy. In particular, the ob-
and F1-measure for each dataset and each model.                  jective of this experiment is to answer the following re-
   The table shows that the combination of ELMo embed-           search questions:
dings with the residual Bi-LSTM is able to outperform               RQ1: What is the impact of domain adaptation strate-
all the other approaches, except for the domain adapta-          gies for ATE on the quality of multi-criteria recommen-
tion scenario in the Laptop dataset, in which case BERT          dations?
achieves slightly higher performance. Concerning the                RQ2: How does our framework compare against sev-
single domain experiment, it is also interesting to note         eral single-criteria baselines?
that all four approaches perform better on the Restau-              For this experiment, we employed the Yelp Recruiting
rants dataset than on the Laptops dataset. This is not           Competition dataset5 , which contains restaurant reviews.
surprising, due to the fact that the Restaurants dataset is      This dataset is composed of 45, 981 users, 11, 537 items,
larger than the Laptops one. Even on the smaller datasets        and 229, 906 reviews, with a sparsity of around 99.95%.
(Hotels, Speakers, Computers, Routers), ELMo still ob-           Each item in the dataset contains the user ID, the business
tained the best performance.                                     ID, the review text, and an overall score given by the
   However, the situation is less clear for the other ap-        user on a 1-5 scale. The review set was also filtered by
proaches. On the Hotels dataset, which is the smallest           excluding all users that rated less than 10 items. The
one, GloVe and Word2Vec obtain second and third place,           filtered dataset contains 4, 393 users, 10, 801 items, and
having a F1 of 0.612 and 0.528 respectively. BERT is again       138, 301 reviews.
last, with 0.332, which may suggest that this approach is
especially affected by training set size. An interesting ob-
servation can be made about the Routers dataset: despite             5
                                                                         https://www.kaggle.com/c/yelp-recruiting/data
Table 2
Results of the ATE task experiments
                            Speakers                           Computers                           Routers
                ELMo     BERT    GloVe     W2V     ELMo       BERT  GloVe      W2V     ELMo     BERT    GloVe     W2V
           P    0.682    0.372    0.486    0.452   0.506      0.334  0.448     0.462   0.462     0.24   0.424      0.24
 Single    R    0.516      0.4    0.338    0.38    0.521      0.286  0.306     0.394   0.388    0.168   0.226      0.14
           F1   0.576     0.38     0.39    0.408   0.514       0.3   0.332     0.41    0.406    0.188    0.29     0.174
           P     0.55    0.412     0.17    0.146    0.61       0.46  0.31      0.258    0.39    0.276   0.084     0.048
   DA      R    0.534     0.54     0.19    0.216   0.452      0.486  0.26      0.304   0.428    0.444   0.076     0.056
           F1   0.534    0.464    0.178    0.176    0.52      0.472  0.282     0.28    0.408    0.336   0.078     0.052

                            Laptops                              Hotels                         Restaurants
                ELMo     BERT    GloVe     W2V     ELMo       BERT   GloVe     W2V     ELMo     BERT   GloVe      W2V
           P    0.684    0.514   0.628     0.604   0.626       0.4    0.648    0.568   0.792    0.692  0.644      0.646
 Single    R     0.68    0.514   0.622     0.632    0.63      0.308   0.596     0.5    0.784    0.706  0.642      0.638
           F1   0.676     0.51   0.626     0.618   0.624      0.332   0.612    0.528   0.784    0.696  0.642      0.638
           P    0.508    0.436   0.092     0.08    0.648      0.592    0.61    0.542    0.67     0.59  0.186      0.186
   DA      R    0.282     0.31    0.04     0.046   0.624      0.672   0.552    0.464   0.496    0.364  0.096      0.096
           F1   0.358    0.36    0.056     0.06    0.632      0.628   0.578     0.5    0.564    0.444  0.126      0.126



4.2.1. Experimental protocol                                  CF baselines, we employed the variants that take into
                                                              account the user and item means, to make them more
The dataset was input to our framework, and all the steps
                                                              comparable with the multi-criteria equivalents. This lets
described in Section 3 were performed. Aspect terms
                                                              us understand whether the aspect-based ratings extracted
were extracted by using the ELMo approach. For this
                                                              by our framework actually cause an improvement in rec-
experiment, we used two ATE models: one trained on
                                                              ommendation accuracy.
all six datasets described in Section 4.1, and another was
trained without the Restaurants datasets, which allows
us to assess the difference in recommendation quality         4.2.2. Results
caused by the lack of annotated ATE training data in the      Table 3 reports the results obtained by the three multi-
target domain.                                                criteria recommendation algorithms supported by our
   The aspect terms were then grouped together into           framework, with different combinations of parameters.
𝑘 aspects, and ratings were assigned via the Sentiment        For the user-to-user and item-to-item algorithms, we
Analysis component described in Section 3.2, which trans-     chose to set the neighborhood size to 10, 20, 30, 80, and
formed each review into a 𝑘 + 1-dimensional vector, con-      200. We chose these numbers as using a higher number of
taining the user’s rating of the restaurant for each of the   neighbors caused a decrease in the accuracy. For all three
𝑘 aspects, plus the overall rating. We experimented with      algorithms, we can observe that the best performance is
different sizes of 𝑘 (10, 30 and 50) in order to increase     obtained by using 10 aspects. This means that by increas-
the generality of the results. Finally, the aspect-based      ing the number of aspects, the performance decreases.
rating vectors were passed to the recommendation al-          This makes sense, since the effectiveness of the multi-
gorithms described in section 3.3. We evaluated the rat-      criteria distance metrics largely depend on the number
ing prediction accuracy of the algorithms by measuring        of commonly rated aspects between the two users (or
the Mean Average Error (MAE). 10-fold cross-validation        the two items). Increasing the number of aspects also
was performed on the dataset, and the MAE values for          increases the sparsity of the aspect-based ratings, which
each fold were averaged together. For each of the three       makes these metrics less effective. Table 3 shows that
multi-criteria recommendation algorithms (User-to-user,       the multi-criteria user-to-user algorithm performs best
Item-to-item, and SVD), we chose the combination of           by setting the neighborhood size to 200, with a MAE
parameters that obtained the best results. These models       of 0.8147 and 0.8155 respectively for the model trained
were then compared against several baselines: single-         with and without the Restaurants dataset. For the multi-
criteria user-to-user CF (with MSD and Pearson similar-       criteria item-to-item variant, the best neighborhood size
ity measures), single-criteria item-to-item CF (with MSD      is 80 for the model trained with the Restaurants dataset,
and Pearson similarity measures), Singular Value Decom-       and 200 for the model trained without it. In both the
position (SVD), and Non-negative Matrix Factorization         neighborhood-based models, we can observe that the
(NMF), which were also trained and tested using 10-fold       model trained without the Restaurants dataset performs
cross-validation. For both user-to-user and item-to-item      slightly worse than the one trained with all datasets. This
Table 3                                                                                        Table 4
Results for the Multi-Criteria algorithms (MAE). The best results for each algorithm are       Results of the recommendation
in italic. The best overall results are in bold.                                               test. Best results are in bold.
                    10 Aspects             30 Aspects              50 Aspects                    Configuration          MAE
 Algorithm    #N.   W/Rest.    W/O Rest.   W/Rest.    W/O Rest.    W/Rest.    W/O Rest.          M.C. U2U (W/ Rest.)     0.8147
 M.C. U2U     10    0.83       0.8306      0.8314     0.8333       0.8329     0.8349             M.C. U2U (W/O Rest.)    0.8155
 M.C. U2U     20    0.8196     0.8206      0.821      0.8228       0.8222     0.8244             U2U (MSD)               0.8169
 M.C. U2U     30    0.8169     0.8178      0.8182     0.8199       0.8194     0.8214             U2U (Pearson)           0.8565
 M.C. U2U     80    0.8148     0.8157      0.8161     0.8176       0.8172     0.8191             M.C. I2I (W/ Rest.)     0.8183
 M.C. U2U     200   0.8147     0.8155      0.8159     0.8174       0.817      0.8189             M.C. I2I (W/O Rest.)    0.8189
 M.C. I2I     10    0.831      0.8321      0.8333     0.8346       0.8347     0.8364             I2I (MSD)               0.8202
 M.C. I2I     20    0.8221     0.8228      0.8239     0.8252       0.8252     0.8269             I2I (Pearson)           0.8582
 M.C. I2I     30    0.82       0.8206      0.8216     0.8229       0.8228     0.8246             M.C. SVD (W/ Rest.)     0.8062
 M.C. I2I     80    0.8183     0.819       0.8199     0.8211       0.8211     0.8227             M.C. SVD (W/O Rest.)   0.8053
 M.C. I2I     200   0.8184     0.8189      0.8199     0.8211       0.8211     0.8227             SVD                     0.8107
 M.C. SVD     -     0.8062     0.8053      0.8064     0.8069       0.8074     0.8081             NMF                     0.8737



is consistent with the observations made during the ex-           recommendation accuracy.
periment described in section 4.1, i.e. the loss in rec-
ommendation accuracy may be caused by a loss in ATE
accuracy. However, this is not true the multi-criteria SVD        5. Conclusion
approach. In fact, the model trained without the Restau-
                                                                  In this paper, we presented an investigation on the use of
rants dataset achieved better performance (MAE: 0.8053)
                                                                  domain adaptation strategies in order to perform Aspect
compared to the one trained on all datasets (MAE: 0.8062).
                                                                  Term Extraction without the need for domain-specific
This suggests that this approach is less susceptible to the
                                                                  training data, as well as the impact of using this strategy
aspect-based rating sparsity problem. A Wilcoxon test
                                                                  in a multi-criteria recommender system. For this purpose,
was performed to evaluate the significance of these dif-
                                                                  we developed an aspect-based recommendation frame-
ferences. The test confirms that they are all significant
                                                                  work that automatically extracts multi-criteria ratings
(𝑝 < 0.01). We can answer RQ1 by stating that that
                                                                  from text reviews using state-of-the-art Deep Learning
the proposed domain adaptation strategy for ATE does
                                                                  ATE models. We performed several experiments to evalu-
indeed cause a sensible loss in recommendation perfor-
                                                                  ate the ATE component both in a single domain and in a
mance in the multi-criteria user-to-user and item-to-item
                                                                  domain adaptation setting in order to find the best model
algorithms. However, it also was associated to an equally
                                                                  to use in the multi-criteria recommendation scenario. We
small increase in the multi-criteria SVD algorithm.
                                                                  trained the aspect term extraction component twice: with
   Finally, in Table 4 we compare the performance of our
                                                                  domain-specific data, and without domain-specific data,
framework with the baselines described earlier. We eval-
                                                                  and tested several combinations of parameters and differ-
uated the single-criteria user-to-user and item-to-item
                                                                  ent multi-criteria recommendation algorithms in order
baselines by setting the neighborhood size to 10, 20, 30,
                                                                  to increase the generality of the results. In all cases, the
80, and 200, and reported the best performance for each
                                                                  framework was able to outperform single-criteria base-
baseline. The results show that all three multi-criteria
                                                                  lines, with small differences between the two models.
algorithms are able to outperform their single-criteria
                                                                  Moreover, the proposed strategy improves the quality
equivalents. The best result overall is achieved by the
                                                                  of the recommendations even when no domain-specific
multi-criteria SVD on the model trained without restau-
                                                                  ATE training data is available.
rants. In fact, even though it is based on a basic aggre-
                                                                     The most important limitation to the validity of our
gation function-based approach, it managed to obtain a
                                                                  experiment is related to the small amount of data avail-
significant improvement over all baselines. A Wilcoxon
                                                                  able for the ATE task. However, it is worth noting that
statistical test was performed in order to verify the sig-
                                                                  this is a limitation of the state of the art, since all works
nificance of the difference in MAE. The test was able to
                                                                  on the subject use the same datasets (or a subset of them)
prove that indeed the multi-criteria SVD approach per-
                                                                  that we used in our work. As future work, we plan to ex-
formed significantly better than all the baselines with
                                                                  tend this work by including more recent Deep Learning
𝑝 < 0.01. This allows us to confidently answer RQ2 by
                                                                  architectures for ATE. We also plan to extend the recom-
stating that our framework compares favorably against
                                                                  mendation test, by including more multi-criteria recom-
all the selected baselines even when no domain-specific
                                                                  mendation algorithms, and by comparing our framework
ATE data was available during training. This proves
                                                                  with systems that extract latent factors from reviews.
that the proposed domain adaptation approach is able to
effectively exploit review data in order to improve the
References                                                                   Conference on Recommender Systems - RecSys
                                                                             ’17, ACM Press, Como, Italy, 2017, pp. 321–325.
[1] L. Chen, G. Chen, F. Wang,                            Recommender        URL: http://dl.acm.org/citation.cfm?doid=3109859.
    systems based on user reviews: the state of                              3109905. doi:1 0 . 1 1 4 5 / 3 1 0 9 8 5 9 . 3 1 0 9 9 0 5 .
    the art,      User Modeling and User-Adapted                        [10] A. Caputo, P. Basile, M. de Gemmis, P. Lops, G. Se-
    Interaction 25 (2015) 99–154. URL: http://link.                          meraro, G. Rossiello, SABRE: A Sentiment Aspect-
    springer.com/10.1007/s11257-015-9155-5. doi:1 0 .                        Based Retrieval Engine, in: C. Lai, A. Giuliani,
    1007/s11257- 015- 9155- 5.                                               G. Semeraro (Eds.), Information Filtering and Re-
[2] X. He, T. Chen, M.-Y. Kan, X. Chen, TriRank:                             trieval: DART 2014: Revised and Invited Papers,
    Review-aware Explainable Recommendation by                               Studies in Computational Intelligence, Springer
    Modeling Aspects, in: Proceedings of the 24th                            International Publishing, Cham, 2017, pp. 63–78.
    ACM International on Conference on Information                           URL: https://doi.org/10.1007/978-3-319-46135-9_4.
    and Knowledge Management - CIKM ’15, ACM                                 doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 3 1 9 - 4 6 1 3 5 - 9 _ 4 .
    Press, Melbourne, Australia, 2015, pp. 1661–1670.                   [11] J. M. Joyce, Kullback-Leibler Divergence,
    URL: http://dl.acm.org/citation.cfm?doid=2806416.                        Springer Berlin Heidelberg, Berlin, Hei-
    2806504. doi:1 0 . 1 1 4 5 / 2 8 0 6 4 1 6 . 2 8 0 6 5 0 4 .             delberg, 2011, pp. 720–722. URL: https:
[3] R. Catherine, W. Cohen, Transnets: Learning to                           //doi.org/10.1007/978-3-642-04898-2_327.
    transform for recommendation, in: Proceedings                            doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 6 4 2 - 0 4 8 9 8 - 2 _ 3 2 7 .
    of the eleventh ACM conference on recommender                       [12] M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageor-
    systems, 2017, pp. 288–296.                                              giou, I. Androutsopoulos, S. Manandhar, SemEval-
[4] S. Seo, J. Huang, H. Yang, Y. Liu, Representation                        2014 Task 4: Aspect Based Sentiment Analysis
    learning of users and items for review rating pre-                       (2014) 9.
    diction using attention-based convolutional neural                  [13] M. Pontiki, D. Galanis, H. Papageorgiou, S. Man-
    network, in: International Workshop on Machine                           andhar, I. Androutsopoulos, Semeval-2015 task 12:
    Learning Methods for Recommender Systems, 2017.                          Aspect based sentiment analysis, in: Proceedings
[5] P. Li, A. Tuzhilin, Latent multi-criteria ratings for                    of the 9th international workshop on semantic eval-
    recommendations, in: Proceedings of the 13th ACM                         uation (SemEval 2015), 2015, pp. 486–495.
    Conference on Recommender Systems, 2019, pp.                        [14] M. Pontiki, D. Galanis, H. Papageorgiou, I. Androut-
    428–431.                                                                 sopoulos, S. Manandhar, A.-S. Mohammad, M. Al-
[6] Q. Diao, M. Qiu, C.-Y. Wu, A. J. Smola, J. Jiang,                        Ayyoub, Y. Zhao, B. Qin, O. De Clercq, SemEval-
    C. Wang, Jointly modeling aspects, ratings and                           2016 Task 5: Aspect Based Sentiment Analysis, in:
    sentiments for movie recommendation (JMARS),                             Proceedings of the 10th International Workshop
    in: Proceedings of the 20th ACM SIGKDD interna-                          on Semantic Evaluation (SemEval-2016), 2016, pp.
    tional conference on Knowledge discovery and data                        19–30.
    mining - KDD ’14, ACM Press, New York, New York,                    [15] M. Hu, B. Liu, Mining and summarizing cus-
    USA, 2014, pp. 193–202. URL: http://dl.acm.org/                          tomer reviews, in: Proceedings of the tenth ACM
    citation.cfm?doid=2623330.2623758. doi:1 0 . 1 1 4 5 /                   SIGKDD international conference on Knowledge
    2623330.2623758.                                                         discovery and data mining, KDD ’04, Association
[7] Y. Zhang, G. Lai, M. Zhang, Y. Zhang, Y. Liu,                            for Computing Machinery, Seattle, WA, USA, 2004,
    S. Ma, Explicit factor models for explainable recom-                     pp. 168–177. URL: https://doi.org/10.1145/1014052.
    mendation based on phrase-level sentiment anal-                          1014073. doi:1 0 . 1 1 4 5 / 1 0 1 4 0 5 2 . 1 0 1 4 0 7 3 .
    ysis, in: Proceedings of the 37th international                     [16] N. Jakob, I. Gurevych, Extracting opinion targets
    ACM SIGIR conference on Research & development                           in a single-and cross-domain setting with condi-
    in information retrieval - SIGIR ’14, ACM Press,                         tional random fields, in: Proceedings of the 2010
    Gold Coast, Queensland, Australia, 2014, pp. 83–92.                      conference on empirical methods in natural lan-
    URL: http://dl.acm.org/citation.cfm?doid=2600428.                        guage processing, Association for Computational
    2609579. doi:1 0 . 1 1 4 5 / 2 6 0 0 4 2 8 . 2 6 0 9 5 7 9 .             Linguistics, 2010, pp. 1035–1045.
[8] K. Bauman, B. Liu, A. Tuzhilin, Recommending                        [17] Z. Chen, A. Mukherjee, B. Liu, M. Hsu, M. Castel-
    Items with Conditions Enhancing User Experiences                         lanos, R. Ghosh, Exploiting domain knowledge in
    Based on Sentiment Analysis of Reviews., in:                             aspect extraction, in: Proceedings of the 2013 Con-
    CBRecSys@ RecSys, 2016, pp. 19–22.                                       ference on Empirical Methods in Natural Language
[9] C. Musto, M. de Gemmis, G. Semeraro, P. Lops,                            Processing, 2013, pp. 1655–1667.
    A Multi-criteria Recommender System Exploiting                      [18] Q. Liu, Z. Gao, B. Liu, Y. Zhang, Automated rule
    Aspect-based Sentiment Analysis of Users’ Re-                            selection for aspect extraction in opinion mining,
    views, in: Proceedings of the Eleventh ACM                               in: Twenty-Fourth International Joint Conference
     on Artificial Intelligence, 2015.                                            with auxiliary labels for cross-domain opinion tar-
[19] Q. Liu, B. Liu, Y. Zhang, D. S. Kim, Z. Gao, Im-                             get extraction, in: Thirty-First AAAI Conference
     proving opinion aspect extraction using semantic                             on Artificial Intelligence, 2017.
     similarity and aspect associations, in: Thirtieth                       [28] W. Wang, S. J. Pan, Recursive Neural Structural
     AAAI Conference on Artificial Intelligence, 2016.                            Correspondence Network for Cross-domain As-
[20] J. Pavlopoulos, I. Androutsopoulos, Aspect Term                              pect and Opinion Co-Extraction, in: Proceedings
     Extraction for Sentiment Analysis: New Datasets,                             of the 56th Annual Meeting of the Association
     New Evaluation Measures and an Improved Un-                                  for Computational Linguistics (Volume 1: Long
     supervised Method, in: Proceedings of the 5th                                Papers), Association for Computational Linguis-
     Workshop on Language Analysis for Social Media                               tics, Melbourne, Australia, 2018, pp. 2171–2181.
     (LASM), Association for Computational Linguistics,                           URL: http://aclweb.org/anthology/P18-1202. doi:1 0 .
     Gothenburg, Sweden, 2014, pp. 44–52. URL: http:                              18653/v1/P18- 1202.
     //aclweb.org/anthology/W14-1306. doi:1 0 . 3 1 1 5 / v 1 /              [29] W. Wang, S. J. Pan, Transferable interactive mem-
     W14- 1306.                                                                   ory network for domain adaptation in fine-grained
[21] S. Poria, E. Cambria, A. Gelbukh, Aspect ex-                                 opinion extraction, in: Proceedings of the AAAI
     traction for opinion mining with a deep convolu-                             Conference on Artificial Intelligence, volume 33,
     tional neural network, Knowledge-Based Systems                               2019, pp. 7192–7199. Issue: 01.
     108 (2016) 42–49. URL: https://linkinghub.elsevier.                     [30] R. M. Marcacini, R. G. Rossi, I. P. Matsuno, S. O.
     com/retrieve/pii/S0950705116301721. doi:1 0 . 1 0 1 6 /                      Rezende, Cross-domain aspect extraction for
     j.knosys.2016.06.009.                                                        sentiment analysis: A transductive learning ap-
[22] A. Giannakopoulos, C. Musat, A. Hossmann,                                    proach, Decision Support Systems 114 (2018)
     M. Baeriswyl, Unsupervised Aspect Term Extrac-                               70–80. URL: http://www.sciencedirect.com/science/
     tion with B-LSTM & CRF using Automatically La-                               article/pii/S0167923618301386. doi:1 0 . 1 0 1 6 / j . d s s .
     belled Datasets, in: Proceedings of the 8th Work-                            2018.08.009.
     shop on Computational Approaches to Subjectiv-                          [31] Y. Lee, M. Chung, S. Cho, J. Choi, Extraction of
     ity, Sentiment and Social Media Analysis, 2017, pp.                          Product Evaluation Factors with a Convolutional
     180–188.                                                                     Neural Network and Transfer Learning, Neural
[23] X. Li, W. Lam, Deep Multi-Task Learning for Aspect                           Processing Letters 50 (2019) 149–164. URL: https:
     Term Extraction with Memory Interaction, in: Pro-                            //doi.org/10.1007/s11063-018-9964-8. doi:1 0 . 1 0 0 7 /
     ceedings of the 2017 Conference on Empirical Meth-                           s11063- 018- 9964- 8.
     ods in Natural Language Processing, Association                         [32] O. Pereg, D. Korat, M. Wasserblat,              Syntac-
     for Computational Linguistics, Copenhagen, Den-                              tically Aware Cross-Domain Aspect and Opin-
     mark, 2017, pp. 2886–2892. URL: http://aclweb.org/                           ion Terms Extraction, in: Proceedings of the
     anthology/D17-1310. doi:1 0 . 1 8 6 5 3 / v 1 / D 1 7 - 1 3 1 0 .            28th International Conference on Computational
[24] X. Li, L. Bing, P. Li, W. Lam, Z. Yang, Aspect term ex-                      Linguistics, International Committee on Compu-
     traction with history attention and selective trans-                         tational Linguistics, Barcelona, Spain (Online),
     formation, in: Proceedings of the 27th International                         2020, pp. 1772–1777. URL: https://www.aclweb.org/
     Joint Conference on Artificial Intelligence, 2018, pp.                       anthology/2020.coling-main.158. doi:1 0 . 1 8 6 5 3 / v 1 /
     4194–4200.                                                                   2020.coling- main.158.
[25] H. Ye, Z. Yan, Z. Luo, W. Chao, Dependency-                             [33] T. Liang, W. Wang, F. Lv, Weakly Supervised Do-
     Tree Based Convolutional Neural Networks for                                 main Adaptation for Aspect Extraction via Multi-
     Aspect Term Extraction, in: J. Kim, K. Shim,                                 level Interaction Transfer, IEEE Transactions on
     L. Cao, J.-G. Lee, X. Lin, Y.-S. Moon (Eds.), Ad-                            Neural Networks and Learning Systems (2021). Pub-
     vances in Knowledge Discovery and Data Mining,                               lisher: IEEE.
     volume 10235, Springer International Publishing,                        [34] A. Da’u, N. Salim, I. Rabiu, A. Osman, Recommenda-
     Cham, 2017, pp. 350–362. URL: http://link.springer.                          tion system exploiting aspect-based opinion mining
     com/10.1007/978-3-319-57529-2_28. doi:1 0 . 1 0 0 7 /                        with deep learning method, Information Sciences
     9 7 8 - 3 - 3 1 9 - 5 7 5 2 9 - 2 _ 2 8 , series Title: Lecture Notes        512 (2020) 1279–1292. Publisher: Elsevier.
     in Computer Science.                                                    [35] Q. Tran, A. MacKinlay, A. J. Yepes, Named Entity
[26] H. Luo, T. Li, B. Liu, B. Wang, H. Unger, Improv-                            Recognition with stack residual LSTM and trainable
     ing aspect term extraction with bidirectional de-                            bias decoding, arXiv:1706.07598 [cs] (2017). URL:
     pendency tree representation, IEEE/ACM Transac-                              http://arxiv.org/abs/1706.07598, arXiv: 1706.07598.
     tions on Audio, Speech, and Language Processing                         [36] T. Mikolov, K. Chen, G. Corrado, J. Dean, Effi-
     27 (2019) 1201–1212. Publisher: IEEE.                                        cient Estimation of Word Representations in Vec-
[27] Y. Ding, J. Yu, J. Jiang, Recurrent neural networks                          tor Space, arXiv:1301.3781 [cs] (2013). URL: http:
     //arxiv.org/abs/1301.3781, arXiv: 1301.3781.
[37] J. Pennington, R. Socher, C. Manning, Glove: Global
     Vectors for Word Representation, in: Proceedings
     of the 2014 Conference on Empirical Methods in
     Natural Language Processing (EMNLP), Associa-
     tion for Computational Linguistics, Doha, Qatar,
     2014, pp. 1532–1543. URL: https://www.aclweb.org/
     anthology/D14-1162. doi:1 0 . 3 1 1 5 / v 1 / D 1 4 - 1 1 6 2 .
[38] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner,
     C. Clark, K. Lee, L. Zettlemoyer, Deep contextual-
     ized word representations, arXiv:1802.05365 [cs]
     (2018). URL: http://arxiv.org/abs/1802.05365, arXiv:
     1802.05365.
[39] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT:
     Pre-training of Deep Bidirectional Transformers for
     Language Understanding, arXiv:1810.04805 [cs]
     (2019). URL: http://arxiv.org/abs/1810.04805, arXiv:
     1810.04805.
[40] G. Adomavicius, Y. Kwon, New Recommendation
     Techniques for Multicriteria Rating Systems, IEEE
     Intelligent Systems 22 (2007) 48–55. doi:1 0 . 1 1 0 9 /
     MIS.2007.58.
[41] Y. Koren, R. Bell, C. Volinsky, Matrix factorization
     techniques for recommender systems, Computer
     42 (2009) 30–37. Publisher: IEEE.
[42] F. Ricci, L. Rokach, B. Shapira, P. B. Kan-
     tor (Eds.), Recommender Systems Handbook,
     Springer US, Boston, MA, 2011. URL: http://link.
     springer.com/10.1007/978-0-387-85820-3. doi:1 0 .
     1007/978- 0- 387- 85820- 3.