=Paper= {{Paper |id=Vol-2960/paper15 |storemode=property |title=A General Aspect-Term-Extraction Model for Multi-Criteria Recommendations (Long paper) |pdfUrl=https://ceur-ws.org/Vol-2960/paper15.pdf |volume=Vol-2960 |authors=Paolo Pastore,Andrea Iovine,Fedelucio Narducci,Giovanni Semeraro |dblpUrl=https://dblp.org/rec/conf/recsys/PastoreINS21 }} ==A General Aspect-Term-Extraction Model for Multi-Criteria Recommendations (Long paper)== https://ceur-ws.org/Vol-2960/paper15.pdf

A General Aspect-Term-Extraction Model for Multi-Criteria
Recommendations
Paolo Pastore1 , Andrea Iovine2 , Fedelucio Narducci1 and Giovanni Semeraro2
1
Polytechnic University of Bari, Italy
2
Dept. of Computer Science University of Bari, Italy

Abstract
In recent years, increasingly large quantities of user reviews have been made available by several e-commerce platforms.
This content is very useful for recommender systems (RSs), since it reflects the users’ opinion of the items regarding several
aspects. In fact, they are especially valuable for RSs that are able to exploit multi-faceted user ratings. However, extracting
aspect-based ratings from unstructured text is not a trivial task. Deep Learning models for aspect extraction have proven
to be effective, but they need to be trained on large quantities of domain-specific data, which are not always available. In
this paper, we explore the possibility of transferring knowledge across domains for automatically extracting aspects from
user reviews, and its implications in terms of recommendation accuracy. We performed different experiments with several
Deep Learning-based Aspect Term Extraction (ATE) techniques and Multi-Criteria recommendation algorithms. Results
show that our framework is able to improve recommendation accuracy compared to several baselines based on single-criteria
recommendation, despite the fact that no labeled data in the target domain was used when training the ATE model.

Keywords
multi-criteria recommendation, deep learning, aspect term extraction, domain adaptation, transfer learning

1. Introduction both aspects and ratings must be extracted automatically
from unstructured text. This task is usually referred to
Nowadays, many Web platforms and e-commerce web- as Aspect-Based Sentiment Analysis (ABSA). ABSA is not
sites allow customers to express their opinions by pro- a trivial task, because there is no stable definition of ”as-
viding reviews on items, services, or media. Such user- pect”, due to its intrinsic subjectivity. Also, the same
generated content is extremely valuable for recommen- aspect can appear in many different forms inside user
dation, since it reflects the user’s perception of a spe- reviews. For instance, a reviewer could use ”service”,
cific item and of specific features of that item listing ”staff” or ”waiter” for referring to the ”service” category.
its strengths and weaknesses, the most important fea- For this reason, we distinguish between the aspect itself
tures, and the tasks for which it is more (or less) suitable. and its representation forms in the reviews, also called
Extracting this information and exploiting it to enrich aspect terms. Furthermore, the aspects used in a domain
user profiles and item descriptions can give enormous are completely different to those in other domains: for
advantages to Recommender Systems (RSs). Given the restaurants, users will mention features such as the food
considerable importance of reviews in the recommen- or the quality of the service, when talking about smart-
dation process, many works in the literature proposed phones, they will instead refer to other aspects such as
the idea of integrating them into RSs, as a way to im- the screen or the camera. In recent years, many models
prove their accuracy. Specifically, text reviews can be a for automatically extracting aspects from text based on
solution to the rating sparsity problem often encountered Deep Learning models have been proposed. However,
by RSs based on Collaborative Filtering (CF), and can these techniques need to be trained on domain-specific
be used to capture a much more fine-grained model of labeled datasets that are not always available.
the customer’s preferences [1]. Accordingly, instead of In this paper, we investigate the application of domain
modeling the user’s profile as a set of (item, rating) pairs, adaptation strategies for aspect-based recommendation.
it might be represented as a set of (item, aspect, rating) The aim is to evaluate the effectiveness of modern Deep
triples. Of course, the problem with this approach is that Learning-based Aspect Term Extraction (ATE) models
when no annotated data is available for the target do-
3rd Edition of Knowledge-aware and Conversational Recommender main. For this purpose, we developed an aspect-based
Systems (KaRS) & 5th Edition of Recommendation in Complex
Environments (ComplexRec) Joint Workshop @ RecSys 2021,
recommendation framework that includes an ATE mod-
September 27–1 October 2021, Amsterdam, Netherlands ule, an Aspect Clustering module, a Sentiment Analysis
Envelope-Open paolo.pastore1@poliba.it (P. Pastore); andrea.iovine@uniba.it (SA) module, and a Multi-Criteria Recommender Sys-
(A. Iovine); fedelucio.narducci@poliba.it (F. Narducci); tem. We performed an experimental study to compare
giovanni.semeraro@uniba.it (G. Semeraro) several ATE models both in a single domain scenario
© 2021 Copyright for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0). and in a domain adaptation setting. We then chose the
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
model that obtained the best performance in both set- as recommendation algorithms. Our work follows a simi-
tings, i.e. the model that is most able to capture the es- lar approach. In our framework however, the ATE task is
sential, domain-invariant characteristics of aspect terms. performed using state-of-the-art Deep Learning models.
Finally, we tested the framework in a recommendation ABSA has proven to be a very effective method for
scenario, to understand whether the models involved in improving the accuracy, usefulness and persuasiveness
this study actually improve the accuracy of RSs, com- of the recommendations. As a result, Natural Language
pared to single-criteria recommendation baselines. This Processing (NLP) research focused on improving ABSA
will prove that our framework is able to successfully ex- and ATE models, and more resources have been made
tract fine-grained ratings from text, and exploit them for available for these tasks. Examples of such resources are
improving the quality of the recommendations. the SemEval datasets [12, 13, 14], and Hu and Liu [15].
In summary, the main contributions of this work are: Earlier works on ATE proposed strategies such as as-
(a) The definition of a novel framework for aspect-based sociation rule mining [15], Conditional Random Fields
recommendation, that can automatically extract aspect- (CRF) [16], knowledge-based topic modeling [17], or dou-
based ratings from unstructured text (i.e. reviews) inde- ble propagation [18, 19]. In recent years, the success of
pendently from the domain, using Deep Learning models; Deep Learning models in Natural Language Processing
(b) An evaluation of the performance of Deep Learning- tasks meant that research focus has moved towards us-
based ATE models in a domain adaptation setting (i.e. ing neural networks for ATE. Pavlopoulos and Androut-
when no annotated data in the target domain is available); sopoulos [20] improved the method described in [15] by
(c) An evaluation of the performance of our framework, using word embeddings generated via Word2Vec. Poria
compared to a set of single-criteria recommendation base- et al. [21] used Convolutional Neural Networks (CNNs)
lines, in terms of rating prediction accuracy. and several word embedding strategies. Giannakopou-
los et al. [22] developed a model for both supervised
and unsupervised ATE in large review datasets, based
2. Related work on Bi-Directional Long-Short Term Memory (Bi-LSTM)
networks and CRF. Li and Lam [23] propose a multi-
A great amount of work has been dedicated to research-
task learning framework for ATE and sentiment analysis
ing techniques for enhancing RSs by using data extracted
based on LSTMs. Li et al. [24] use aspect detection his-
from reviews. Chen et al. [1] and He et al. [2] contain a
tory and opinion summary to enhance the ATE model.
review of the state of the art of review-aware RSs. There
Some works investigate the addition of dependency re-
are three main types of approaches: Word-based, that
lationships in order to improve the accuracy of neural
consists of directly using words found in the review as
network-based models, such as Ye et al. [25] and Luo et
the user profile; Sentiment-based, that aims to extract
al. [26].
the user’s overall rating of an item via Sentiment Analy-
Finally, some works are focused on developing ATE
sis; Aspect-based, that exploits multi-faceted ratings from
methods that can generalize over different domains, us-
reviews.Our work is strictly focused on aspect-based rec-
ing transfer learning or domain adaptation approaches.
ommendation, extracting explicit factors from text re-
An early example is Jakob and Gurevych [16], which
views rather than latent factors (such as in [3, 4, 5]). The
used a CRF-based approach. Ding et al. [27] use RNNs
main advantage is that aspects can be also useful outside
combined with rule-based auxiliary labels. Wang and
recommendation, e.g. for explanation.
Pan [28] incorporate dependency tree information us-
Many works employ strategies such as topic modeling
ing Recursive Neural Networks for both Aspect Term
[6], sentiment lexicons [7], or rule-based systems [2, 8]
Extraction and Opinion Target Extraction tasks in or-
in order to extract aspect-based ratings from reviews for
der to transfer information between domains. Later, in
recommendation purposes. The experiments performed
[29] they introduce Transferable Interactive Memory Net-
in these works prove that aspect-based ratings can indeed
works (TIMN) that can effectively model a representation
improve recommendation accuracy over single-criteria
for aspect terms across domains. Marcacini et al. [30] use
baselines. In our work, we plan to instead perform the
transductive learning to map linguistic features of source
ATE task by using techniques based on Deep Learning.
and target domains in a heterogeneous network. Lee et al.
In Musto et al. [9], ABSA is applied to a Multi-Criteria
[31] propose a transfer learning approach for ATE that
RS for the restaurant recommendation scenario using
is based on sequentially fine-tuning pre-trained features
a tool called SABRE [10], which is able to extract rele-
over different product groups. Pereg et al. [32] investi-
vant aspects from review text using the Kullback-Leibler
gate the introduction of external syntactic features into a
divergence [11], as well as the rating assigned to each
BERT-based model in order to exploit structural similari-
aspect. Aspects can also be organized into sub-aspects to
ties of aspects across domains. Liang et al. [33] exploit
obtain fine-grained information. Multi-criteria User-to-
the correlation between coarse-grained aspect categories
User and Item-to-Item CF algorithms were both proposed
and fine-grained aspect terms via a multi-level recon-
struction mechanism. In our work, we not only evaluate Analysis modules are used to compose the aspect-based
the performance of several ATE approaches in a domain item ratings, which are organized into a 3-dimesional
adaptation setting, but we also assess their effectiveness tensor (i.e. a tensor in which the first dimension repre-
in improving the accuracy of the recommendations. sents the users, the second represents the items, and the
Recently, Da’u et al. [34] investigated the application third represents the aspect clusters) which is then passed
of Deep Learning aspect extraction models for recommen- to the Multi-Criteria recommendation algorithm. More
dation. While this work has the same premise as ours, details on this component are discussed in Section 3.3.
there are two major differences: first, the architecture Figure 1 shows an example of execution of our frame-
used is based on CNNs, while we included several con- work. Each review is split into atomic sentences, and
figurations based on residual LSTM and BERT. Second, then each sentence is given as input to both the ATE
their work relies on the presence of annotated ATE data module and the SA module, in order to extract both as-
for the target domain, and does not deal with domain pect terms and ratings. In the example, starting from the
adaptation. sentence ”As always we had a great glass of wine while
Based on the analysis of the literature, we have identi- we waited”, the ATE module extracts the ”glass of wine”
fied a gap in the literature. In fact, the papers mentioned aspect term, and the SA module assigns a positive rating
above either describe domain adaptation strategies for to it. The extracted aspect is then given as input to the As-
ATE, or employ ATE for recommendation purposes. To pect Clustering module, that assigns it to the right cluster,
the best of our knowledge, none combine the two ideas i.e. Beverage. The cluster information and the predicted
together, by explicitly measuring the impact of domain sentiments are used to generate the aspect-based ratings
adaptation on the quality of the recommendations. We tensor. The Recommendation Algorithm takes this ten-
believe that this is very important, especially due to the sor as input for generating a list of recommendations.
extreme scarcity of annotated datasets for training ATE
systems, which hinders their applicability to the recom-
mendation scenario. 3.1. Aspect Term Extraction
This section is focused on describing the ATE compo-
3. Aspect-based recommendation nent of the framework. ATE is one of the sub-tasks of
framework ABSA [14].
Most approaches treat the task of extracting relevant
In this section, we describe a novel review-aware aspect- aspects as a sequence labeling problem [21], in which the
based recommendation framework that has been created review is first tokenized, and then each token is classified
for the purposes of this study. We exploit user reviews as either being an aspect term or not. A classifier can be
in order to go beyond item ratings, by extracting richer trained by supplying supervised data, i.e. pre-annotated
aspect-based evaluations. The main advantage of this reviews. The standard schema for annotating reviews is
framework is that it lets us discover new aspects directly the BIO tagging. According to this schema, three distinct
from user reviews. Additionally, the aspect-based item labels can be associated to each token: B means that
ratings enrich the user profile, as they let us understand the token represents the beginning of an aspect term, I
which aspects users care more about. Finally, they allow means that it represents the continuation of an aspect
us to identify the individual strengths and weaknesses of term, while O means that it is not an aspect term. This
each item from the user’s point of view. schema is shared with other sequence labeling tasks, such
The proposed architecture is composed by several sub- as Named Entity Recognition (NER).
modules as shown in the example in Figure 1. The first Figure 2 shows the architecture of the ATE module.
one is the ATE module which is in charge of identifying For this task, we focused on techniques based on Deep
aspects mentioned in the user reviews, by extracting the Learning, which have proven to be the most promising in
corresponding aspect terms from the review text. The the state of the art. In our study, we focused on the well
framework supports several ATE approaches, which will known BERT model and on the residual Bi-LSTM. BERT
be detailed in Section 3.1. is one of the most recent pre-trained frameworks for NLP
The second component is the Aspect Clustering mod- and it can be exploited for many tasks, including NER and
ule, whose role is to group aspect terms that express ATE. The residual Bi-LSTM is a variant of the classical Bi-
similar concepts together into aspects. The Sentiment directional LSTM which was successfully used in other
Analysis module works in parallel with the previous two. sequence labeling tasks such as Tran et al. [35]. It is
Its role is to extract the user’s sentiment from the review composed of two stacked Bi-LSTM layers, where the sum
in order to assign a score to each aspect term. Details on of the output of the first and second layer is sent to the
this step will be discussed in Section 3.2. final softmax layer, instead of sending only the output
The outputs of the Aspect Clustering and Sentiment of the second layer. Different embedding strategies have
Figure 1: Example of recommendation process

been used in order to encode the tokens into real-valued the embeddings generated by ELMo are deeply contextu-
vectors. In particular, we aim to use the ability to capture alized, and are more capable of handling polisemy. In this
a contextual representation of words to learn a model configuration, the architecture is defined as follows: an
that is independent from the domain, i.e. that is able to ELMo embedding layer is used, followed by the residual
extract aspect terms from reviews of any domain. In this Bi-LSTM layers described in the previous configurations.
way, we can exploit a model trained on a given domain BERT. For this configuration, we employed BERT, in-
to extract aspect terms from another, unseen domain. troduced in Devlin et al. [39], which has been successfully
Hence, the definition of domain adaptation. applied in a variety of NLP tasks such as NER and text
The following is a list of all the ATE approaches that classification. Specifically, we employed a pre-trained
are included in the evaluation. BERT model available from the PyTorch library3 . This
Pre-trained Word2Vec-Residual LSTM. Word2Vec model is then fine-tuned, i.e. its parameters are updated
is one of the first successful word embedding techniques, by training it on the ATE task. The NN architecture
introduced in Mikolov et al. [36]. For this configuration, used by BERT is a multi-layer bidirectional Transformer
we employed embeddings that were previously trained encoder, as described in [39].
from a part of the Google News datasets1 . The neural
network architecture used in this configuration is the 3.2. Aspect Term Clustering and
Residual Bi-directional Long-Short Term Memory (LSTM)
described earlier.
Sentiment Analysis
Pre-trained GloVe-Residual LSTM. For this ap- As stated in the Introduction, one of the main problems
proach, we used a set of pre-trained embeddings from of extracting aspect-based ratings from reviews is that
GloVe. GloVe is a model for distributed word representa- users may refer to the same aspect in many different
tion, introduced in Pennington et al. [37]. It is developed forms. Therefore, a strategy for grouping together all
as an open-source project at Stanford University, and aspect forms that refer to the same concept is needed. We
the pre-trained embeddings are publicly available2 . The propose to group aspect terms together based on their
neural network architecture used is the Residual LSTM, Word2Vec representation. In the case of multi-word as-
like in the previous configurations. pect terms, we calculated the average of the embeddings
ELMo embeddings-Residual LSTM. ELMo (Peters of each word. We then perform a clustering task by using
et al. [38]) stands for Embeddings from Language Models, the K-means algorithm. This allows us to automatically
and is a novel contextualized embedding strategy. That group aspect terms into aspect categories in an unsuper-
is, instead of using a single vector for each word in the vised way.
dictionary, ELMo looks at the entire sentence before as- We then used the VADER sentiment analysis model of-
signing each word in it its embedding. The result is that fered by the NLTK library4 to obtain the rating assigned
to each aspect term in the review. Each review is split
1
https://code.google.com/archive/p/ into atomic sentences, which are fed to the sentiment
word2vec/?fbclid=IwAR3poHsG_4PZdqfbR_ analyzer in order to predict their polarity. We then use
JESidu9WLMf44ffd0A8ZFmrxCPiKTDghc5hQCLUeQ this sentiment to assign a score to all the aspect terms
2
https://nlp.stanford.edu/projects/glove/?fbclid=
IwAR3JafEUyzBT5kwgdKHcQH20nQeTzG1NZs2_ 3
https://pypi.org/project/pytorch-pretrained-bert/
4
BHAhuOgaluO0HC7P5WW6EC8 https://www.nltk.org/
Figure 2: Execution of the ATE task with the residual Bi-LSTM and BERT

appearing in that sentence. The final output is the trans- gular Value Decomposition (SVD), which is a matrix fac-
formation of each review into a set of (user, item, aspect, torization technique. More details about the SVD tech-
rating) tuples. This information will be the input to the nique can be found in Koren et al. [41]. This technique
Multi-Criteria RS. was originally developed for single-criteria RSs. In or-
der to extend it to a multi-criteria scenario, we used a
3.3. Aspect-Based Multi-Criteria naive aggregation function-based approach [40, 42]: we
divided the k-dimensional multi-criteria recommenda-
recommendation tion task into a set of 𝑘 single-criteria tasks. This means
Once the proposed framework has extracted all aspect- that we trained 𝑘 SVD models, one for each aspect 𝑎𝑐 , for
based ratings from the reviews, the last step is the recom- 𝑐 ∈ {1, ..., 𝑘}. Each model predicts the rating for a spe-
mendation. Recommendations are generated via a multi- cific aspect 𝑟𝑎𝑐 (𝑢, 𝑖). In order to predict the overall rating
criteria algorithm based on collaborative filtering [40]. 𝑟(𝑢, 𝑖) for a given user 𝑢 and an item 𝑖, we calculate an
For this purpose, we treated the sentiments extracted aggregate function: 𝑟(𝑢, 𝑖) = 𝑓 (𝑟𝑎1 (𝑢, 𝑖), ..., 𝑟𝑎𝑘 (𝑢, 𝑖)). In our
by our framework as the ratings given by the user to case, the aggregate function is a simple average of the
the item for each aspect. For each aspect that was not aspect-based ratings.
mentioned in the user review, we decided to assign the
item’s overall rating. This choice was made empirically,
as it improved the performance of the recommendation 4. Evaluation
algorithm. The rest of this section contains a description This section describes the in-vitro experiment that we set
of the recommendation algorithms. up to evaluate the performance of our framework. The ex-
User-to-User Multi-Criteria CF: This is an exten- periment is divided into two parts. First, we evaluate the
sion of the similarity-based approaches for CF. The dis- ATE models that were described in Section 3.1, in order
tance 𝑑(𝑢𝑗 , 𝑢𝑘 ) between users 𝑢𝑗 and 𝑢𝑘 is calculated using to determine which one has the best performance when
a multi-criteria distance function that takes the ratings trained in a domain adaptation scenario. The second step
given to each aspect into account (Equation 13 in [40]). of the experiment is the recommendation test: we extract
For a new user-item pair, we generate a neighborhood aspect-based ratings from a dataset of restaurant reviews
of top-n most similar users, and then we calculate the using the best ATE model from the previous test, and
predicted overall rating using the adjusted weighted sum then we evaluate each of the multi-criteria recommen-
of the neighbor’s ratings (Equation 3 in [40]). dation approaches discussed in Section 3.3 in terms of
Item-to-Item Multi-Criteria CF: This is the multi- their rating prediction accuracy. These approaches will
criteria equivalent of the item-based CF technique. As also be compared to several baselines. This experiment
for the previous technique, the distance 𝑑(𝑖𝑗 , 𝑖𝑘 ) between will assess whether the multi-criteria recommendations
items is calculated using a multi-criteria distance function generated by our framework are more accurate than the
(Equation 5 in [9]). For any given user-item pair, we ones obtained by using single-criteria ratings.
generate a neighborhood of the top-n most similar items.
The overall predicted rating is calculated using the item-
based equivalent of the adjusted weighted sum approach 4.1. Evaluation of the ATE approaches
found in [40]. We collected six datasets for the ATE task from the lit-
Multi-Criteria SVD: This approach is based on Sin- erature, three of which come from the SemEval ABSA
Table 1 not being the smallest dataset, all approaches performed
Description of the datasets especially poorly on it.
Dataset #Sentences #Aspect terms In the domain adaptation test, ELMo outperforms the
Restaurants (SemEval 2014-15-16) 7841 8183 other three models in five out of six datasets. We also
Laptops (SemEval 2014) 3845 2918
compare the scores obtained from the single domain and
Hotels (SemEval 2015) 266 213
Computers (Liu et al.) 531 363 domain transfer tests. In the largest datasets, we can
Speakers (Liu et al.) 689 454 observe that the latter induces a substantial loss in F1
Routers (Liu et al.) 879 325
compared to the former: around 28% in the Restaurants
domain, and around 47% in the Laptops domain. This
loss can be attributed to the lack of domain-specific data
challenges with reviews about restaurants, laptops and in the respective domains. In the smaller datasets such as
hotels [12, 13, 14], while the other three are found in Liu Hotels, the loss is either very small, or nonexistent. Simi-
et al. [18] and contain reviews about computers, speakers lar observations can be made for the BERT approach in
and routers. Table 1 reports the number of sentences and the larger datasets. In the smaller datasets however, the
aspect terms contained in each dataset. domain transfer configuration actually outperforms the
A single domain study was conducted by training and single domain one. This gives more credibility to the hy-
testing each ATE model on the same dataset. Train- pothesis that BERT is more susceptible to training set size
test split was performed via 5-fold cross validation. The compared to ELMo. The GloVe and Word2Vec approaches
metrics used to evaluate the performance are Precision, show much larger losses. This is a clear indication that
Recall, and F1-score. An aspect term was considered they are less capable of transferring knowledge on the
correctly recognized if all the tokens that compose it ATE task from one domain to another.
were correctly tagged by the system. Therefore, partial Based on the results from this Section, we can say
matches were not considered in the evaluation. For each with enough confidence that ELMo is the approach that
configuration, we calculated the overall score by averag- obtained the best performance in the ATE task. Not
ing the metrics obtained for each fold. only it outperformed the other three approaches in the
In addition to the single domain study, we performed a single domain setting, but it is also demonstrated a good
domain adaptation experiment, which tests each model’s ability to transfer the aspect extraction task over different
ability to generalize the ATE task onto a new, unseen domains. For this reason, we chose this approach as part
domain. We performed six tests, one for each dataset. of the ATE component of our framework.
In each test, we used one dataset as the test set, and all
remaining datasets as the training and development set,
using a random 80-20 split. 4.2. Evaluation of the Recommender
Table 2 describes the results of experiments. Single System
refers to the single domain tests, while DA refers to the We performed an experiment to measure our frame-
domain adaptation tests. We report the Precision, Recall work’s recommendation accuracy. In particular, the ob-
and F1-measure for each dataset and each model. jective of this experiment is to answer the following re-
The table shows that the combination of ELMo embed- search questions:
dings with the residual Bi-LSTM is able to outperform RQ1: What is the impact of domain adaptation strate-
all the other approaches, except for the domain adapta- gies for ATE on the quality of multi-criteria recommen-
tion scenario in the Laptop dataset, in which case BERT dations?
achieves slightly higher performance. Concerning the RQ2: How does our framework compare against sev-
single domain experiment, it is also interesting to note eral single-criteria baselines?
that all four approaches perform better on the Restau- For this experiment, we employed the Yelp Recruiting
rants dataset than on the Laptops dataset. This is not Competition dataset5 , which contains restaurant reviews.
surprising, due to the fact that the Restaurants dataset is This dataset is composed of 45, 981 users, 11, 537 items,
larger than the Laptops one. Even on the smaller datasets and 229, 906 reviews, with a sparsity of around 99.95%.
(Hotels, Speakers, Computers, Routers), ELMo still ob- Each item in the dataset contains the user ID, the business
tained the best performance. ID, the review text, and an overall score given by the
However, the situation is less clear for the other ap- user on a 1-5 scale. The review set was also filtered by
proaches. On the Hotels dataset, which is the smallest excluding all users that rated less than 10 items. The
one, GloVe and Word2Vec obtain second and third place, filtered dataset contains 4, 393 users, 10, 801 items, and
having a F1 of 0.612 and 0.528 respectively. BERT is again 138, 301 reviews.
last, with 0.332, which may suggest that this approach is
especially affected by training set size. An interesting ob-
servation can be made about the Routers dataset: despite 5
https://www.kaggle.com/c/yelp-recruiting/data
Table 2
Results of the ATE task experiments
Speakers Computers Routers
ELMo BERT GloVe W2V ELMo BERT GloVe W2V ELMo BERT GloVe W2V
P 0.682 0.372 0.486 0.452 0.506 0.334 0.448 0.462 0.462 0.24 0.424 0.24
Single R 0.516 0.4 0.338 0.38 0.521 0.286 0.306 0.394 0.388 0.168 0.226 0.14
F1 0.576 0.38 0.39 0.408 0.514 0.3 0.332 0.41 0.406 0.188 0.29 0.174
P 0.55 0.412 0.17 0.146 0.61 0.46 0.31 0.258 0.39 0.276 0.084 0.048
DA R 0.534 0.54 0.19 0.216 0.452 0.486 0.26 0.304 0.428 0.444 0.076 0.056
F1 0.534 0.464 0.178 0.176 0.52 0.472 0.282 0.28 0.408 0.336 0.078 0.052

Laptops Hotels Restaurants
ELMo BERT GloVe W2V ELMo BERT GloVe W2V ELMo BERT GloVe W2V
P 0.684 0.514 0.628 0.604 0.626 0.4 0.648 0.568 0.792 0.692 0.644 0.646
Single R 0.68 0.514 0.622 0.632 0.63 0.308 0.596 0.5 0.784 0.706 0.642 0.638
F1 0.676 0.51 0.626 0.618 0.624 0.332 0.612 0.528 0.784 0.696 0.642 0.638
P 0.508 0.436 0.092 0.08 0.648 0.592 0.61 0.542 0.67 0.59 0.186 0.186
DA R 0.282 0.31 0.04 0.046 0.624 0.672 0.552 0.464 0.496 0.364 0.096 0.096
F1 0.358 0.36 0.056 0.06 0.632 0.628 0.578 0.5 0.564 0.444 0.126 0.126

4.2.1. Experimental protocol CF baselines, we employed the variants that take into
account the user and item means, to make them more
The dataset was input to our framework, and all the steps
comparable with the multi-criteria equivalents. This lets
described in Section 3 were performed. Aspect terms
us understand whether the aspect-based ratings extracted
were extracted by using the ELMo approach. For this
by our framework actually cause an improvement in rec-
experiment, we used two ATE models: one trained on
ommendation accuracy.
all six datasets described in Section 4.1, and another was
trained without the Restaurants datasets, which allows
us to assess the difference in recommendation quality 4.2.2. Results
caused by the lack of annotated ATE training data in the Table 3 reports the results obtained by the three multi-
target domain. criteria recommendation algorithms supported by our
The aspect terms were then grouped together into framework, with different combinations of parameters.
𝑘 aspects, and ratings were assigned via the Sentiment For the user-to-user and item-to-item algorithms, we
Analysis component described in Section 3.2, which trans- chose to set the neighborhood size to 10, 20, 30, 80, and
formed each review into a 𝑘 + 1-dimensional vector, con- 200. We chose these numbers as using a higher number of
taining the user’s rating of the restaurant for each of the neighbors caused a decrease in the accuracy. For all three
𝑘 aspects, plus the overall rating. We experimented with algorithms, we can observe that the best performance is
different sizes of 𝑘 (10, 30 and 50) in order to increase obtained by using 10 aspects. This means that by increas-
the generality of the results. Finally, the aspect-based ing the number of aspects, the performance decreases.
rating vectors were passed to the recommendation al- This makes sense, since the effectiveness of the multi-
gorithms described in section 3.3. We evaluated the rat- criteria distance metrics largely depend on the number
ing prediction accuracy of the algorithms by measuring of commonly rated aspects between the two users (or
the Mean Average Error (MAE). 10-fold cross-validation the two items). Increasing the number of aspects also
was performed on the dataset, and the MAE values for increases the sparsity of the aspect-based ratings, which
each fold were averaged together. For each of the three makes these metrics less effective. Table 3 shows that
multi-criteria recommendation algorithms (User-to-user, the multi-criteria user-to-user algorithm performs best
Item-to-item, and SVD), we chose the combination of by setting the neighborhood size to 200, with a MAE
parameters that obtained the best results. These models of 0.8147 and 0.8155 respectively for the model trained
were then compared against several baselines: single- with and without the Restaurants dataset. For the multi-
criteria user-to-user CF (with MSD and Pearson similar- criteria item-to-item variant, the best neighborhood size
ity measures), single-criteria item-to-item CF (with MSD is 80 for the model trained with the Restaurants dataset,
and Pearson similarity measures), Singular Value Decom- and 200 for the model trained without it. In both the
position (SVD), and Non-negative Matrix Factorization neighborhood-based models, we can observe that the
(NMF), which were also trained and tested using 10-fold model trained without the Restaurants dataset performs
cross-validation. For both user-to-user and item-to-item slightly worse than the one trained with all datasets. This
Table 3 Table 4
Results for the Multi-Criteria algorithms (MAE). The best results for each algorithm are Results of the recommendation
in italic. The best overall results are in bold. test. Best results are in bold.
10 Aspects 30 Aspects 50 Aspects Configuration MAE
Algorithm #N. W/Rest. W/O Rest. W/Rest. W/O Rest. W/Rest. W/O Rest. M.C. U2U (W/ Rest.) 0.8147
M.C. U2U 10 0.83 0.8306 0.8314 0.8333 0.8329 0.8349 M.C. U2U (W/O Rest.) 0.8155
M.C. U2U 20 0.8196 0.8206 0.821 0.8228 0.8222 0.8244 U2U (MSD) 0.8169
M.C. U2U 30 0.8169 0.8178 0.8182 0.8199 0.8194 0.8214 U2U (Pearson) 0.8565
M.C. U2U 80 0.8148 0.8157 0.8161 0.8176 0.8172 0.8191 M.C. I2I (W/ Rest.) 0.8183
M.C. U2U 200 0.8147 0.8155 0.8159 0.8174 0.817 0.8189 M.C. I2I (W/O Rest.) 0.8189
M.C. I2I 10 0.831 0.8321 0.8333 0.8346 0.8347 0.8364 I2I (MSD) 0.8202
M.C. I2I 20 0.8221 0.8228 0.8239 0.8252 0.8252 0.8269 I2I (Pearson) 0.8582
M.C. I2I 30 0.82 0.8206 0.8216 0.8229 0.8228 0.8246 M.C. SVD (W/ Rest.) 0.8062
M.C. I2I 80 0.8183 0.819 0.8199 0.8211 0.8211 0.8227 M.C. SVD (W/O Rest.) 0.8053
M.C. I2I 200 0.8184 0.8189 0.8199 0.8211 0.8211 0.8227 SVD 0.8107
M.C. SVD - 0.8062 0.8053 0.8064 0.8069 0.8074 0.8081 NMF 0.8737

is consistent with the observations made during the ex- recommendation accuracy.
periment described in section 4.1, i.e. the loss in rec-
ommendation accuracy may be caused by a loss in ATE
accuracy. However, this is not true the multi-criteria SVD 5. Conclusion
approach. In fact, the model trained without the Restau-
In this paper, we presented an investigation on the use of
rants dataset achieved better performance (MAE: 0.8053)
domain adaptation strategies in order to perform Aspect
compared to the one trained on all datasets (MAE: 0.8062).
Term Extraction without the need for domain-specific
This suggests that this approach is less susceptible to the
training data, as well as the impact of using this strategy
aspect-based rating sparsity problem. A Wilcoxon test
in a multi-criteria recommender system. For this purpose,
was performed to evaluate the significance of these dif-
we developed an aspect-based recommendation frame-
ferences. The test confirms that they are all significant
work that automatically extracts multi-criteria ratings
(𝑝 < 0.01). We can answer RQ1 by stating that that
from text reviews using state-of-the-art Deep Learning
the proposed domain adaptation strategy for ATE does
ATE models. We performed several experiments to evalu-
indeed cause a sensible loss in recommendation perfor-
ate the ATE component both in a single domain and in a
mance in the multi-criteria user-to-user and item-to-item
domain adaptation setting in order to find the best model
algorithms. However, it also was associated to an equally
to use in the multi-criteria recommendation scenario. We
small increase in the multi-criteria SVD algorithm.
trained the aspect term extraction component twice: with
Finally, in Table 4 we compare the performance of our
domain-specific data, and without domain-specific data,
framework with the baselines described earlier. We eval-
and tested several combinations of parameters and differ-
uated the single-criteria user-to-user and item-to-item
ent multi-criteria recommendation algorithms in order
baselines by setting the neighborhood size to 10, 20, 30,
to increase the generality of the results. In all cases, the
80, and 200, and reported the best performance for each
framework was able to outperform single-criteria base-
baseline. The results show that all three multi-criteria
lines, with small differences between the two models.
algorithms are able to outperform their single-criteria
Moreover, the proposed strategy improves the quality
equivalents. The best result overall is achieved by the
of the recommendations even when no domain-specific
multi-criteria SVD on the model trained without restau-
ATE training data is available.
rants. In fact, even though it is based on a basic aggre-
The most important limitation to the validity of our
gation function-based approach, it managed to obtain a
experiment is related to the small amount of data avail-
significant improvement over all baselines. A Wilcoxon
able for the ATE task. However, it is worth noting that
statistical test was performed in order to verify the sig-
this is a limitation of the state of the art, since all works
nificance of the difference in MAE. The test was able to
on the subject use the same datasets (or a subset of them)
prove that indeed the multi-criteria SVD approach per-
that we used in our work. As future work, we plan to ex-
formed significantly better than all the baselines with
tend this work by including more recent Deep Learning
𝑝 < 0.01. This allows us to confidently answer RQ2 by
architectures for ATE. We also plan to extend the recom-
stating that our framework compares favorably against
mendation test, by including more multi-criteria recom-
all the selected baselines even when no domain-specific
mendation algorithms, and by comparing our framework
ATE data was available during training. This proves
with systems that extract latent factors from reviews.
that the proposed domain adaptation approach is able to
effectively exploit review data in order to improve the
References Conference on Recommender Systems - RecSys
’17, ACM Press, Como, Italy, 2017, pp. 321–325.
[1] L. Chen, G. Chen, F. Wang, Recommender URL: http://dl.acm.org/citation.cfm?doid=3109859.
systems based on user reviews: the state of 3109905. doi:1 0 . 1 1 4 5 / 3 1 0 9 8 5 9 . 3 1 0 9 9 0 5 .
the art, User Modeling and User-Adapted [10] A. Caputo, P. Basile, M. de Gemmis, P. Lops, G. Se-
Interaction 25 (2015) 99–154. URL: http://link. meraro, G. Rossiello, SABRE: A Sentiment Aspect-
springer.com/10.1007/s11257-015-9155-5. doi:1 0 . Based Retrieval Engine, in: C. Lai, A. Giuliani,
1007/s11257- 015- 9155- 5. G. Semeraro (Eds.), Information Filtering and Re-
[2] X. He, T. Chen, M.-Y. Kan, X. Chen, TriRank: trieval: DART 2014: Revised and Invited Papers,
Review-aware Explainable Recommendation by Studies in Computational Intelligence, Springer
Modeling Aspects, in: Proceedings of the 24th International Publishing, Cham, 2017, pp. 63–78.
ACM International on Conference on Information URL: https://doi.org/10.1007/978-3-319-46135-9_4.
and Knowledge Management - CIKM ’15, ACM doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 3 1 9 - 4 6 1 3 5 - 9 _ 4 .
Press, Melbourne, Australia, 2015, pp. 1661–1670. [11] J. M. Joyce, Kullback-Leibler Divergence,
URL: http://dl.acm.org/citation.cfm?doid=2806416. Springer Berlin Heidelberg, Berlin, Hei-
2806504. doi:1 0 . 1 1 4 5 / 2 8 0 6 4 1 6 . 2 8 0 6 5 0 4 . delberg, 2011, pp. 720–722. URL: https:
[3] R. Catherine, W. Cohen, Transnets: Learning to //doi.org/10.1007/978-3-642-04898-2_327.
transform for recommendation, in: Proceedings doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 6 4 2 - 0 4 8 9 8 - 2 _ 3 2 7 .
of the eleventh ACM conference on recommender [12] M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageor-
systems, 2017, pp. 288–296. giou, I. Androutsopoulos, S. Manandhar, SemEval-
[4] S. Seo, J. Huang, H. Yang, Y. Liu, Representation 2014 Task 4: Aspect Based Sentiment Analysis
learning of users and items for review rating pre- (2014) 9.
diction using attention-based convolutional neural [13] M. Pontiki, D. Galanis, H. Papageorgiou, S. Man-
network, in: International Workshop on Machine andhar, I. Androutsopoulos, Semeval-2015 task 12:
Learning Methods for Recommender Systems, 2017. Aspect based sentiment analysis, in: Proceedings
[5] P. Li, A. Tuzhilin, Latent multi-criteria ratings for of the 9th international workshop on semantic eval-
recommendations, in: Proceedings of the 13th ACM uation (SemEval 2015), 2015, pp. 486–495.
Conference on Recommender Systems, 2019, pp. [14] M. Pontiki, D. Galanis, H. Papageorgiou, I. Androut-
428–431. sopoulos, S. Manandhar, A.-S. Mohammad, M. Al-
[6] Q. Diao, M. Qiu, C.-Y. Wu, A. J. Smola, J. Jiang, Ayyoub, Y. Zhao, B. Qin, O. De Clercq, SemEval-
C. Wang, Jointly modeling aspects, ratings and 2016 Task 5: Aspect Based Sentiment Analysis, in:
sentiments for movie recommendation (JMARS), Proceedings of the 10th International Workshop
in: Proceedings of the 20th ACM SIGKDD interna- on Semantic Evaluation (SemEval-2016), 2016, pp.
tional conference on Knowledge discovery and data 19–30.
mining - KDD ’14, ACM Press, New York, New York, [15] M. Hu, B. Liu, Mining and summarizing cus-
USA, 2014, pp. 193–202. URL: http://dl.acm.org/ tomer reviews, in: Proceedings of the tenth ACM
citation.cfm?doid=2623330.2623758. doi:1 0 . 1 1 4 5 / SIGKDD international conference on Knowledge
2623330.2623758. discovery and data mining, KDD ’04, Association
[7] Y. Zhang, G. Lai, M. Zhang, Y. Zhang, Y. Liu, for Computing Machinery, Seattle, WA, USA, 2004,
S. Ma, Explicit factor models for explainable recom- pp. 168–177. URL: https://doi.org/10.1145/1014052.
mendation based on phrase-level sentiment anal- 1014073. doi:1 0 . 1 1 4 5 / 1 0 1 4 0 5 2 . 1 0 1 4 0 7 3 .
ysis, in: Proceedings of the 37th international [16] N. Jakob, I. Gurevych, Extracting opinion targets
ACM SIGIR conference on Research & development in a single-and cross-domain setting with condi-
in information retrieval - SIGIR ’14, ACM Press, tional random fields, in: Proceedings of the 2010
Gold Coast, Queensland, Australia, 2014, pp. 83–92. conference on empirical methods in natural lan-
URL: http://dl.acm.org/citation.cfm?doid=2600428. guage processing, Association for Computational
2609579. doi:1 0 . 1 1 4 5 / 2 6 0 0 4 2 8 . 2 6 0 9 5 7 9 . Linguistics, 2010, pp. 1035–1045.
[8] K. Bauman, B. Liu, A. Tuzhilin, Recommending [17] Z. Chen, A. Mukherjee, B. Liu, M. Hsu, M. Castel-
Items with Conditions Enhancing User Experiences lanos, R. Ghosh, Exploiting domain knowledge in
Based on Sentiment Analysis of Reviews., in: aspect extraction, in: Proceedings of the 2013 Con-
CBRecSys@ RecSys, 2016, pp. 19–22. ference on Empirical Methods in Natural Language
[9] C. Musto, M. de Gemmis, G. Semeraro, P. Lops, Processing, 2013, pp. 1655–1667.
A Multi-criteria Recommender System Exploiting [18] Q. Liu, Z. Gao, B. Liu, Y. Zhang, Automated rule
Aspect-based Sentiment Analysis of Users’ Re- selection for aspect extraction in opinion mining,
views, in: Proceedings of the Eleventh ACM in: Twenty-Fourth International Joint Conference
on Artificial Intelligence, 2015. with auxiliary labels for cross-domain opinion tar-
[19] Q. Liu, B. Liu, Y. Zhang, D. S. Kim, Z. Gao, Im- get extraction, in: Thirty-First AAAI Conference
proving opinion aspect extraction using semantic on Artificial Intelligence, 2017.
similarity and aspect associations, in: Thirtieth [28] W. Wang, S. J. Pan, Recursive Neural Structural
AAAI Conference on Artificial Intelligence, 2016. Correspondence Network for Cross-domain As-
[20] J. Pavlopoulos, I. Androutsopoulos, Aspect Term pect and Opinion Co-Extraction, in: Proceedings
Extraction for Sentiment Analysis: New Datasets, of the 56th Annual Meeting of the Association
New Evaluation Measures and an Improved Un- for Computational Linguistics (Volume 1: Long
supervised Method, in: Proceedings of the 5th Papers), Association for Computational Linguis-
Workshop on Language Analysis for Social Media tics, Melbourne, Australia, 2018, pp. 2171–2181.
(LASM), Association for Computational Linguistics, URL: http://aclweb.org/anthology/P18-1202. doi:1 0 .
Gothenburg, Sweden, 2014, pp. 44–52. URL: http: 18653/v1/P18- 1202.
//aclweb.org/anthology/W14-1306. doi:1 0 . 3 1 1 5 / v 1 / [29] W. Wang, S. J. Pan, Transferable interactive mem-
W14- 1306. ory network for domain adaptation in fine-grained
[21] S. Poria, E. Cambria, A. Gelbukh, Aspect ex- opinion extraction, in: Proceedings of the AAAI
traction for opinion mining with a deep convolu- Conference on Artificial Intelligence, volume 33,
tional neural network, Knowledge-Based Systems 2019, pp. 7192–7199. Issue: 01.
108 (2016) 42–49. URL: https://linkinghub.elsevier. [30] R. M. Marcacini, R. G. Rossi, I. P. Matsuno, S. O.
com/retrieve/pii/S0950705116301721. doi:1 0 . 1 0 1 6 / Rezende, Cross-domain aspect extraction for
j.knosys.2016.06.009. sentiment analysis: A transductive learning ap-
[22] A. Giannakopoulos, C. Musat, A. Hossmann, proach, Decision Support Systems 114 (2018)
M. Baeriswyl, Unsupervised Aspect Term Extrac- 70–80. URL: http://www.sciencedirect.com/science/
tion with B-LSTM & CRF using Automatically La- article/pii/S0167923618301386. doi:1 0 . 1 0 1 6 / j . d s s .
belled Datasets, in: Proceedings of the 8th Work- 2018.08.009.
shop on Computational Approaches to Subjectiv- [31] Y. Lee, M. Chung, S. Cho, J. Choi, Extraction of
ity, Sentiment and Social Media Analysis, 2017, pp. Product Evaluation Factors with a Convolutional
180–188. Neural Network and Transfer Learning, Neural
[23] X. Li, W. Lam, Deep Multi-Task Learning for Aspect Processing Letters 50 (2019) 149–164. URL: https:
Term Extraction with Memory Interaction, in: Pro- //doi.org/10.1007/s11063-018-9964-8. doi:1 0 . 1 0 0 7 /
ceedings of the 2017 Conference on Empirical Meth- s11063- 018- 9964- 8.
ods in Natural Language Processing, Association [32] O. Pereg, D. Korat, M. Wasserblat, Syntac-
for Computational Linguistics, Copenhagen, Den- tically Aware Cross-Domain Aspect and Opin-
mark, 2017, pp. 2886–2892. URL: http://aclweb.org/ ion Terms Extraction, in: Proceedings of the
anthology/D17-1310. doi:1 0 . 1 8 6 5 3 / v 1 / D 1 7 - 1 3 1 0 . 28th International Conference on Computational
[24] X. Li, L. Bing, P. Li, W. Lam, Z. Yang, Aspect term ex- Linguistics, International Committee on Compu-
traction with history attention and selective trans- tational Linguistics, Barcelona, Spain (Online),
formation, in: Proceedings of the 27th International 2020, pp. 1772–1777. URL: https://www.aclweb.org/
Joint Conference on Artificial Intelligence, 2018, pp. anthology/2020.coling-main.158. doi:1 0 . 1 8 6 5 3 / v 1 /
4194–4200. 2020.coling- main.158.
[25] H. Ye, Z. Yan, Z. Luo, W. Chao, Dependency- [33] T. Liang, W. Wang, F. Lv, Weakly Supervised Do-
Tree Based Convolutional Neural Networks for main Adaptation for Aspect Extraction via Multi-
Aspect Term Extraction, in: J. Kim, K. Shim, level Interaction Transfer, IEEE Transactions on
L. Cao, J.-G. Lee, X. Lin, Y.-S. Moon (Eds.), Ad- Neural Networks and Learning Systems (2021). Pub-
vances in Knowledge Discovery and Data Mining, lisher: IEEE.
volume 10235, Springer International Publishing, [34] A. Da’u, N. Salim, I. Rabiu, A. Osman, Recommenda-
Cham, 2017, pp. 350–362. URL: http://link.springer. tion system exploiting aspect-based opinion mining
com/10.1007/978-3-319-57529-2_28. doi:1 0 . 1 0 0 7 / with deep learning method, Information Sciences
9 7 8 - 3 - 3 1 9 - 5 7 5 2 9 - 2 _ 2 8 , series Title: Lecture Notes 512 (2020) 1279–1292. Publisher: Elsevier.
in Computer Science. [35] Q. Tran, A. MacKinlay, A. J. Yepes, Named Entity
[26] H. Luo, T. Li, B. Liu, B. Wang, H. Unger, Improv- Recognition with stack residual LSTM and trainable
ing aspect term extraction with bidirectional de- bias decoding, arXiv:1706.07598 [cs] (2017). URL:
pendency tree representation, IEEE/ACM Transac- http://arxiv.org/abs/1706.07598, arXiv: 1706.07598.
tions on Audio, Speech, and Language Processing [36] T. Mikolov, K. Chen, G. Corrado, J. Dean, Effi-
27 (2019) 1201–1212. Publisher: IEEE. cient Estimation of Word Representations in Vec-
[27] Y. Ding, J. Yu, J. Jiang, Recurrent neural networks tor Space, arXiv:1301.3781 [cs] (2013). URL: http:
//arxiv.org/abs/1301.3781, arXiv: 1301.3781.
[37] J. Pennington, R. Socher, C. Manning, Glove: Global
Vectors for Word Representation, in: Proceedings
of the 2014 Conference on Empirical Methods in
Natural Language Processing (EMNLP), Associa-
tion for Computational Linguistics, Doha, Qatar,
2014, pp. 1532–1543. URL: https://www.aclweb.org/
anthology/D14-1162. doi:1 0 . 3 1 1 5 / v 1 / D 1 4 - 1 1 6 2 .
[38] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner,
C. Clark, K. Lee, L. Zettlemoyer, Deep contextual-
ized word representations, arXiv:1802.05365 [cs]
(2018). URL: http://arxiv.org/abs/1802.05365, arXiv:
1802.05365.
[39] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT:
Pre-training of Deep Bidirectional Transformers for
Language Understanding, arXiv:1810.04805 [cs]
(2019). URL: http://arxiv.org/abs/1810.04805, arXiv:
1810.04805.
[40] G. Adomavicius, Y. Kwon, New Recommendation
Techniques for Multicriteria Rating Systems, IEEE
Intelligent Systems 22 (2007) 48–55. doi:1 0 . 1 1 0 9 /
MIS.2007.58.
[41] Y. Koren, R. Bell, C. Volinsky, Matrix factorization
techniques for recommender systems, Computer
42 (2009) 30–37. Publisher: IEEE.
[42] F. Ricci, L. Rokach, B. Shapira, P. B. Kan-
tor (Eds.), Recommender Systems Handbook,
Springer US, Boston, MA, 2011. URL: http://link.
springer.com/10.1007/978-0-387-85820-3. doi:1 0 .
1007/978- 0- 387- 85820- 3.