1. Introduction

S. Aciar, D. Zhang, S. Simof, J. Debenham, Informed recommender: Basing recommendations on con- sumer product reviews, IEEE Intelligent Systems

10.1109/MIS.2007.55

Are User-Generated Item Reviews Actually Beneficial for Recommendation?

Tzu-Hua Kao

Lea Dahm

Tobias Eichinger

0 0 Technische Universität Berlin , Straße des 17. Juni 135, Berlin, 10623 , Germany

2007

22 2007 39 47

User-generated item reviews are widely believed to represent a valuable source of information for recommendation. However, a recent empirical analysis of review-based algorithms by Sachdeva and McAuley puts this this belief into question. In this paper, we analyze the recommender systems literature that seeks to improve recommendation by using item reviews as auxiliary information. We identify the ways in which the information condensed in item reviews is represented. We then point out particular goals, such as performance improvement, and problems, such as cold-start and sparsity, that have been adressed by using item reviews. We arrive at the same conclusion as Sachdeva and McAuley that item reviews can be beneficial, yet are not beneficial per se. The field is saturated with methods that leverage item reviews yet lacks studies on when and why certain methods are beneficial. The current state-of-the-art therefore does not yield a definitive answer to the question whether using item reviews is actually beneficial for recommendation.

eol>recommender systems item reviews natural language processing deep learning

1. Introduction

the conclusion that it is not at all clear whether and how item reviews benefit recommendation.

Intrigued by this conclusion, we set out to address the following research questions: Traditionally, recommender systems utilize user ratings and item attributes to suggest items to users that are tailored to their preferences. To date, a large body of literature identifies user-generated item reviews (here- 1. Are item reviews beneficial for recommendation? after: item reviews) as a rich source of information that 2. In what situations are item reviews beneficial? allows to improve recommendation. The earliest systems 3. How are item reviews beneficial? that integrate item reviews emerged between 2005 and On the basis of a literature review, we arrive at the fol2010 [1, 2, 3]. The rapid growth of machine learning, and lowing position: It is important to understand what kind deep learning in particular, put strong natural language of information condensed in item reviews, if any, is benprocessing techniques into the hands of recommender eficial for recommendation, and how that information systems researchers to make use of item reviews. can be leveraged. We now present the findings of our

Although the utilization of item reviews for recom- literature review mendation generally leads to more accurate recommendations, and it therefore appears obvious that item reviews are beneficial for recommendation, the findings by 2. Analysis Sachdeva and McAuley [4] put this view into question.

They find that state-of-the-art systems that make use of item reviews often cannot outperform simple baseline systems. Notably, the diference between using and not using item reviews is often insignificant. They come to

We present the underlying methodology of our litera

ture review. We then touch on how the information condensed in item reviews can be represented. We close by pointing out goals and problems that have been addressed by leveraging item reviews. and problems that the authors addressed by leveraging the information condensed in item reviews.

An overview of our literature review can be found in the appendix. We confer the gentle reader to these three survey papers [7, 8, 9] for further details on the utilization of item reviews for recommendation. Our literature review is diferent from these prior surveys, since we challenge the popular view that item reviews are beneficial for recommendation per se.

2.2. Review Representations

Item reviews are widely believed to be a rich source of information for recommendation. However, many distinct ways to utilize the information condensed in item reviews have been proposed in the literature. We adapt the list of widely used methods to extract and represent the information condensed in item reivews by Chen et al. [7] from 2015 to describe the current state-of-the-art: • Frequent Terms: Words extracted by statistical models according to their frequency. • Keywords: Keywords are important descriptions that represent semantic information on items. • Auxiliary Properties: Meta information such as the length and timestamp of an item review. • Item Aspects: Fine-grained topics such as the location and food quality of a restaurant, which are discussed in the item review. • Aspect Sentiment: Combination of item aspect and user sentiment that represent not explicitly pronounced user preferences. • Contextual Opinion: Opinions that vary with the context of item usage, e.g. visiting a restaurant during work or on a date. • Term-based User/Item Profile : Profiles based on the terms used in item reviews that represent individual users or items. • Review Embedding: The above hand-crafted approaches are depend on human intervention. State-of-the-art deep learning methods such as deep encoders and transformer-based encoders allow to embed and represent item reviews as vectors without human intervention.

This list is not exhaustive, yet highlights the most popular approaches to represent item reviews. We now tend to the goals pursued by extracting and representing the information condensed in item reviews.

2.3. Goals A majority of relevant papers (25 out of 36 papers) aim to utilize item reviews for the improvement of recommendation performance. Apart from the primary goal of performance improvement, some authors have address minor goals. We compile the following list of goals pursued by leveraging item reviews for recommendation: • Performance Improvement: Improving recommendation performance with respect to the usual performance metrics. • Recommendation Explanation: Explaining to the user why and how a recommendation is generated. Also referred to as ’transparency’. • Review Ranking: Ranking item reviews to for instance filter item reviews by their usefulness. • Novel Systems: Creating novel recommender systems that do not fit into the main categories of collaborative filtering, content-based filtering, and knowledge-based systems or mixtures thereof. • Context Inference: Infering the context of a user on the basis of his or her item review. • De-Biasing: Reducing, or ideally removing, bias such as gender or popularity bias.

This list is not exhaustive, yet highlights the most popular goals pursued by utilizing item reviews for recommendation. We now tend to the problems pursued by utilizing the information condensed in item reviews. 2.4. Problems

A number of recommender systems implementations utilize item reviews to alleviate the traditional cold-start and sparsity problem. Beyond these widely addressed problems, various other niche problems have been addressed in the literature. We compile the following list of problems addressed in the context of utilizing item reviews for recommendation: • Cold-Start: The problem that recommender systems may struggle to recommend new items and or recommending items of interest to new users. • Sparsity: The problem that a large portion of useritem interactions such as ratings or clicks are unknown to a recommender system. • Spurious Correlations: The problem that some correlations between items are only apparent in item reviews and not for instance in ratings. • Review Ambiguity: The problem that item reviews can have diferent meanings depending on for instance the reviewer’s personality.

This list is not exhaustive, yet highlights the most popular problems that have been addressed by utilizing item reviews for recommendation. We now discuss the research questions put forth in the introduction. 3. Discussion 3.3. Limitation and Future Direction

We hold that whether or not item reviews are beneficial for recommenation can only be decided by proving the following three claims. First, item reviews actually contain information useful for recommendation. Second, the usefulness of an item review can be identified. And third, that useful information can be extracted. Interestingly, it is widely assumed that item reviews contain useful information. However, not always do item review-based features present useful information [6].

The second and third claims are usually shown by eval- 4. Conclusion uating the efectiveness with which a goal (see Section 2.3) or a problem (see Section 2.4) is addressed by using We address the question if, and under which circumitem reviews. Since the first claim is never established, stances, recommendation benefits from the use of userwe cannot conclude that item reviews are actually ben- generated item reviews. Towards this goal, we identify eficial for recommendation. We can only conclude that and analyze 36 papers that leverage item reviews for item reviews can be beneficial for recommendation, as recommendation published between 2010 and 2022. We underpinned empirically by Sachdeva and McAuley [4]. do not find clear indications in the literature in which Therefore, we cannot clearly answer Research Questions circumstances item reviews can be considered to be con2 and 3. We thus have a closer look on the popular goal sistently beneficial for recommendation. of performance improvement using item reviews. The literature clearly shows that utilizing item reviews can be beneficial for recommendation. However, the lit3.2. Performance Improvement Using erature fails to show when utilizing item reviews benefits Item Reviews recommendation and why. The widespread belief that using item reviews for recommendation is beneficial per Improved recommendation performance through higher se hampers a deeper understanding of whether or not accuracy would be reached if the recommender systems this belief holds true. The benefit of using item reviews results are better suited to the task at hand due to the remains ambiguous. We therefore argue that the field use of item reviews, meaning lower error rates and better needs to first establish a basic understanding of why and overall evaluation results. Item reviews can be profitably how item reviews can benefit recommendation rather exploited towards this goal. Another measure of perfor- than showing that it potentially can. mance is the robustness of systems. This relates to the question whether there are improvements in the way that typical problems of recommender systems are faced Acknowledgments (see Section 2.4). As discussed above, this is another area where item reviews are commonly utilized. The authors would like to thank Alana Diebitsch and

Recommender systems achieve higher accuracy and Jan Tovar for their help in collecting and reviewing the robustness from the utilization of item reviews. Gener- papers that formed the basis of our literature review. ally, researchers exploit item reviews in order to improve the results of existing recommendation models. Recom- References mender systems based only on item reviews are rare, and those which we found are often meant to be embedded into a larger recommender system. [2] A. Yates, J. Joseph, A.-M. Popescu, A. D. Cohn, proach for recommending useful product reviews, N. Sillick, Shopsmart: Product recommendations Knowledge and Information Systems 26 (2011) 419– through technical specifications and user reviews, 434. doi:10.1007/s10115-010-0287-y. in: Proc. of the 17th ACM Conf. on Information [13] G. Ling, M. R. Lyu, I. King, Ratings meet reviews, and Knowledge Management, ACM, 2008, pp. 1501– a combined approach to recommend, in: Proc. 1502. doi:10.1145/1458082.1458355. of the 8th ACM Conf. on Recommender systems, [3] N. Jakob, S. H. Weber, M. C. Müller, I. Gurevych, Be- ACM, 2014, pp. 105–112. doi:10.1145/2645710. yond the stars: Exploiting free-text user reviews to 2645728. improve the accuracy of movie recommendations, [14] G. Chen, L. Chen, Augmenting service recomin: Proc. of the 1st Int. CIKM Workshop on Topic- mender systems by incorporating contextual opinSentiment Analysis for Mass Opinion, ACM, 2009, ions from user reviews, User Modeling and Userpp. 57—-64. doi:10.1145/1651461.1651473. Adapted Interaction 25 (2015) 295–329. [4] N. Sachdeva, J. McAuley, How useful are reviews for [15] Y. Zhang, Incorporating phrase-level sentiment recommendation? a critical review and potential analysis on textual reviews for personalized recimprovements, in: Proc. of the 43rd Int. ACM SIGIR ommendation, in: Proc. of the 8th ACM Int. Conf. Conf. on Research and Development in Information on Web Search and Data Mining, ACM, 2015, pp. Retrieval, ACM, 2020, pp. 1845–1848. doi:10.1145/ 435–440. doi:10.1145/2684822.2697033. 3397271.3401281. [16] L. Zheng, V. Noroozi, P. S. Yu, Joint deep modeling [5] G. Penha, C. Hauf, What does bert know about of users and items using reviews for recommendabooks, movies and music? probing bert for con- tion, in: Proc. of the 10th ACM Int. Conf. on Web versational recommendation, in: Proc. of the 14th Search and Data Mining, ACM, 2017, pp. 425–434. ACM Conf. on Recommender Systems, ACM, 2020, doi:10.1145/3018661.3018665. pp. 388–397. doi:10.1145/3383313.3412249. [17] S. Seo, J. Huang, H. Yang, Y. Liu, Interpretable [6] T. Eichinger, Reviews are gold!? on the link be- convolutional neural networks with dual local and tween item reviews and item preferences, in: Joint global attention for review rating prediction, in: Workshop Proc. of the 3rd Edition of Knowledge- Proc. of the 11th ACM Conf. on Recommender aware and Conversational Recommender Systems Systems, ACM, 2017, pp. 297–305. doi:10.1145/ (KaRS) and the 5th Edition of Recommendation in 3109859.3109890.

Complex Environments (ComplexRec), CEUR-WS, [18] D. Paul, S. Sarkar, M. Chelliah, C. Kalyan, P. P. Sinai 2021. URL: http://ceur-ws.org/Vol-2960/paper2.pdf. Nadkarni, Recommendation of high quality rep[7] L. Chen, G. Chen, F. Wang, Recommender systems resentative reviews in e-commerce, in: Proc. of based on user reviews: the state of the art, User the 11th ACM Conf. on Recommender Systems, Modeling and User-Adapted Interaction 25 (2015) ACM, 2017, pp. 311–315. doi:10.1145/3109859. 99–154. doi:10.1007/s11257-015-9155-5. 3109901. [8] S. M. Al-Ghuribi, S. A. Mohd Noah, Multi-criteria [19] C. Musto, M. de Gemmis, G. Semeraro, P. Lops, review-based recommender system–the state of the A multi-criteria recommender system exploiting art, IEEE Access 7 (2019) 169446–169468. doi:10. aspect-based sentiment analysis of users’ reviews, 1109/ACCESS.2019.2954861. in: Proc. of the 11th ACM Conf. on Recommender [9] M. Hernández-Rubio, I. Cantador, A. Bellogín, Systems, ACM, 2017, pp. 321–325. doi:10.1145/ A comparative analysis of recommender sys- 3109859.3109905. tems based on item aspect opinions extracted [20] F. Lahlou, H. Benbrahim, I. Kassou, Textual confrom user reviews, User Modeling and User- text aware factorization machines: Improving recAdapted Interaction 29 (2019) 381–441. doi:10. ommendation by leveraging users’ reviews, in: 1007/s11257-018-9214-9. Proc. of the 2nd Int. Conf. on Smart Digital En[10] M. Ferrari Dacrema, P. Cremonesi, D. Jannach, vironment, ACM, 2018, pp. 64–69. doi:10.1145/ Are we really making much progress? a worry- 3289100.3289111. ing analysis of recent neural recommendation ap- [21] Y. Lu, R. Dong, B. Smyth, Why i like it: Multiproaches, in: Proc. of the 13th ACM Conf. on task learning for recommendation and explanaRecommender Systems, ACM, 2019, pp. 101—-109. tion, in: Proc. of the 12th ACM Conf. on Recdoi:10.1145/3298689.3347058. ommender Systems, ACM, 2018, pp. 4–12. doi:10. [11] M. Terzi, M.-A. Ferrario, J. Whittle, Free text in 1145/3240323.3240365.

user reviews: Their role in recommender systems, [22] D. Hyun, C. Park, M.-C. Yang, I. Song, J.-T. Lee, in: Workshop on Recommender Systems and the H. Yu, Review sentiment-guided scalable deep recSocial Web, 2011, pp. 45–48. ommender system, in: The 41st Int. ACM SIGIR [12] R. Zhang, T. Tran, An information gain-based ap- Conf. on Research & Development in Information Retrieval, ACM, 2018, pp. 965–968. doi:10.1145/ for vae-based recommender systems, in: Proc. of 3209978.3210111. the 43rd Int. ACM SIGIR Conf. on Research and [23] P. Bhagat, J. D. Pawar, A comparative study of Development in Information Retrieval, ACM, 2020, feature extraction methods from user reviews for pp. 1269–1278. doi:10.1145/3397271.3401091. recommender systems, in: Proc. of the ACM India [33] H. Liu, W. Wang, H. Xu, Q. Peng, P. Jiao, Neural Joint Int. Conf. on Data Science and Management unified review recommendation with cross attenof Data, ACM, 2018, pp. 325–328. doi:10.1145/ tion, in: Proc. of the 43rd Int. ACM SIGIR Conf. 3152494.3167982. on Research and Development in Information Re[24] H. Xia, Z. Wang, B. Du, L. Zhang, S. Chen, G. Chun, trieval, ACM, 2020, pp. 1789–1792. doi:10.1145/ Leveraging ratings and reviews with gating mecha- 3397271.3401249. nism for recommendation, in: Proc. of the 28th [34] D. Antognini, C. Musat, B. Faltings, Interacting with ACM Int. Conf. on Information and Knowledge explanations through critiquing, in: Proc. of the Management, ACM, 2019, pp. 1573–1582. doi:10. 30th Int. Joint Conf. on Artificial Intelligence, 2021, 1145/3357384.3357919. pp. 515–521. doi:10.24963/ijcai.2021/72. [25] G. Alexandridis, T. Tagaris, G. Siolas, A. Stafy- [35] T. K. Aslanyan, F. Frasincar, Utilizing textual relopatis, From free-text user reviews to product views in latent factor models for recommender sysrecommendation using paragraph vectors and ma- tems, in: Proc. of the 36th Annual ACM Symposium trix factorization, in: Companion Proc. of The 2019 on Applied Computing, ACM, 2021, pp. 1931–1940. World Wide Web Conf., ACM, 2019, pp. 335–343. [36] I. Kostric, K. Balog, F. Radlinski, Soliciting user prefdoi:10.1145/3308560.3316601. erences in conversational recommender systems via [26] J. Ni, J. Li, J. McAuley, Justifying recommendations usage-related questions, in: Proc. of the 15th ACM using distantly-labeled reviews and fine-grained Conf. on Recommender Systems, ACM, 2021, pp. aspects, in: Proc. of the 2019 Conf. on Empirical 724–729. doi:10.1145/3460231.3478861. Methods in Natural Language Processing and the [37] C. Lin, X. Liu, G. Xv, H. Li, Mitigating sentiment 9th Int. Joint Conf. on Natural Language Processing bias for recommender systems, in: Proc. of the (EMNLP-IJCNLP), ACL, 2019, pp. 188–197. doi:10. 44th Int. ACM SIGIR Conf. on Research and Devel18653/v1/D19-1018. opment in Information Retrieval, ACM, 2021, pp. [27] G. Wu, K. Luo, S. Sanner, H. Soh, Deep language- 31–40. doi:10.1145/3404835.3462943. based critiquing for recommender systems, in: Proc. [38] X. Wang, I. Ounis, C. Macdonald, Leveraging review of the 13th ACM Conf. on Recommender Systems, properties for efective recommendation, in: Proc. ACM, 2019, pp. 137–145. doi:10.1145/3298689. of the Web Conf. 2021, ACM, 2021, pp. 2209–2219. 3347009. doi:10.1145/3442381.3450038. [28] D. Rafailidis, F. Crestani, Adversarial training for [39] S. Pan, D. Li, H. Gu, T. Lu, X. Luo, N. Gu, Accureview-based recommendations, in: Proc. of the rate and explainable recommendation via review 42nd Int. ACM SIGIR Conf. on Research and Devel- rationalization, in: Proc. of the ACM Web Conf. opment in Information Retrieval, ACM, 2019, pp. 2022, ACM, 2022, pp. 3092–3101. doi:10.1145/ 1057–1060. doi:10.1145/3331184.3331313. 3485447.3512029. [29] A. Salah, Q.-T. Truong, H. W. Lauw, Cornac: A com- [40] Y. Zhang, W. Zuo, Z. Shi, B. K. Adhikari, Integrating parative framework for multimodal recommender reviews and ratings into graph neural networks for systems, JMLR 21 (2020) 1–5. URL: http://jmlr.org/ rating prediction, Journal of Ambient Intelligence papers/v21/19-805.html. and Humanized Computing (2022). doi:10.1007/ [30] F. J. Peña, D. O’Reilly-Morgan, E. Z. Tragos, N. Hur- s12652-021-03626-7.

ley, E. Duriakova, B. Smyth, A. Lawlor, Combining rating and review data by initializing latent factor models with topic models for top-n recom- A. Appendix mendation, in: Proc. of the 14th ACM Conf. on Recommender Systems, ACM, 2020, pp. 438–443. We present a tabular overview of our categorization of doi:10.1145/3383313.3412207. the 36 papers we find relevant for the convenience of [31] J. P. Zhou, Z. Cheng, F. Perez, M. Volkovs, the gentle reader.

Tafa: Two-headed attention fused autoencoder for context-aware recommendations, in: Proc. of the 14th ACM Conf. on Recommender Systems, ACM, 2020, pp. 338–347. doi:10.1145/3383313.

3412268. [32] K. Luo, H. Yang, G. Wu, S. Sanner, Deep critiquing

) I)(ftrrrceeeev52aoopnnPmmm lii(ttceex9aaaooopdEnnnnRmm ii)(eev4agknnRRw ) 4 l()tsseev3oSym I(fttrceeeexonnn N C