Are User-Generated Item Reviews Actually Beneficial for Recommendation? Tzu-Hua Kao1,† , Lea Dahm1,† and Tobias Eichinger1,* 1 Technische Universität Berlin, Straße des 17. Juni 135, Berlin, 10623, Germany Abstract User-generated item reviews are widely believed to represent a valuable source of information for recommendation. However, a recent empirical analysis of review-based algorithms by Sachdeva and McAuley puts this this belief into question. In this paper, we analyze the recommender systems literature that seeks to improve recommendation by using item reviews as auxiliary information. We identify the ways in which the information condensed in item reviews is represented. We then point out particular goals, such as performance improvement, and problems, such as cold-start and sparsity, that have been adressed by using item reviews. We arrive at the same conclusion as Sachdeva and McAuley that item reviews can be beneficial, yet are not beneficial per se. The field is saturated with methods that leverage item reviews yet lacks studies on when and why certain methods are beneficial. The current state-of-the-art therefore does not yield a definitive answer to the question whether using item reviews is actually beneficial for recommendation. Keywords recommender systems, item reviews, natural language processing, deep learning 1. Introduction the conclusion that it is not at all clear whether and how item reviews benefit recommendation. Traditionally, recommender systems utilize user ratings Intrigued by this conclusion, we set out to address the and item attributes to suggest items to users that are following research questions: tailored to their preferences. To date, a large body of literature identifies user-generated item reviews (here- 1. Are item reviews beneficial for recommendation? after: item reviews) as a rich source of information that 2. In what situations are item reviews beneficial? allows to improve recommendation. The earliest systems 3. How are item reviews beneficial? that integrate item reviews emerged between 2005 and On the basis of a literature review, we arrive at the fol- 2010 [1, 2, 3]. The rapid growth of machine learning, and lowing position: It is important to understand what kind deep learning in particular, put strong natural language of information condensed in item reviews, if any, is ben- processing techniques into the hands of recommender eficial for recommendation, and how that information systems researchers to make use of item reviews. can be leveraged. We now present the findings of our Although the utilization of item reviews for recom- literature review mendation generally leads to more accurate recommen- dations, and it therefore appears obvious that item re- views are beneficial for recommendation, the findings by 2. Analysis Sachdeva and McAuley [4] put this view into question. They find that state-of-the-art systems that make use of We present the underlying methodology of our litera- item reviews often cannot outperform simple baseline ture review. We then touch on how the information systems. Notably, the difference between using and not condensed in item reviews can be represented. We close using item reviews is often insignificant. They come to by pointing out goals and problems that have been ad- dressed by leveraging item reviews. 4th Edition of Knowledge-aware and Conversational Recommender Systems (KaRS) Workshop @ RecSys 2022, September 18–23 2023, Seat- 2.1. Methodology tle, WA, USA. * Corresponding author. We first searched papers based on three recent papers † These authors contributed equally. that leverage item reviews for recommendation: [4, 5, 6]. $ tzu-hua.kao@campus.tu-berlin.de (T. Kao); Based on title and abstract, we then collected a sample lea.dahm@campus.tu-berlin.de (L. Dahm); of 50 papers for further reviewing. After two rounds tobias.eichinger@tu-berlin.de (T. Eichinger) € https://www.snet.tu-berlin.de/menue/team/tobias_eichinger/ of filtering the papers for relevance, we found only 36 (T. Eichinger) papers relevant. We first sorted the papers by publication  0000-0002-8351-2823 (T. Eichinger) year. We then labeled each paper by the way that item © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). reviews were used. Finally, we applied labels for the goals CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) and problems that the authors addressed by leveraging performance improvement, some authors have address the information condensed in item reviews. minor goals. We compile the following list of goals pur- An overview of our literature review can be found sued by leveraging item reviews for recommendation: in the appendix. We confer the gentle reader to these three survey papers [7, 8, 9] for further details on the • Performance Improvement: Improving recommen- utilization of item reviews for recommendation. Our dation performance with respect to the usual per- literature review is different from these prior surveys, formance metrics. since we challenge the popular view that item reviews • Recommendation Explanation: Explaining to the are beneficial for recommendation per se. user why and how a recommendation is gener- ated. Also referred to as ’transparency’. 2.2. Review Representations • Review Ranking: Ranking item reviews to for in- stance filter item reviews by their usefulness. Item reviews are widely believed to be a rich source of • Novel Systems: Creating novel recommender sys- information for recommendation. However, many dis- tems that do not fit into the main categories of col- tinct ways to utilize the information condensed in item laborative filtering, content-based filtering, and reviews have been proposed in the literature. We adapt knowledge-based systems or mixtures thereof. the list of widely used methods to extract and represent • Context Inference: Infering the context of a user the information condensed in item reivews by Chen et on the basis of his or her item review. al. [7] from 2015 to describe the current state-of-the-art: • De-Biasing: Reducing, or ideally removing, bias • Frequent Terms: Words extracted by statistical such as gender or popularity bias. models according to their frequency. This list is not exhaustive, yet highlights the most popular • Keywords: Keywords are important descriptions goals pursued by utilizing item reviews for recommenda- that represent semantic information on items. tion. We now tend to the problems pursued by utilizing • Auxiliary Properties: Meta information such as the information condensed in item reviews. the length and timestamp of an item review. • Item Aspects: Fine-grained topics such as the lo- cation and food quality of a restaurant, which are 2.4. Problems discussed in the item review. A number of recommender systems implementations uti- • Aspect Sentiment: Combination of item aspect lize item reviews to alleviate the traditional cold-start and user sentiment that represent not explicitly and sparsity problem. Beyond these widely addressed pronounced user preferences. problems, various other niche problems have been ad- • Contextual Opinion: Opinions that vary with the dressed in the literature. We compile the following list context of item usage, e.g. visiting a restaurant of problems addressed in the context of utilizing item during work or on a date. reviews for recommendation: • Term-based User/Item Profile: Profiles based on the terms used in item reviews that represent • Cold-Start: The problem that recommender sys- individual users or items. tems may struggle to recommend new items and • Review Embedding: The above hand-crafted ap- or recommending items of interest to new users. proaches are depend on human intervention. • Sparsity: The problem that a large portion of user- State-of-the-art deep learning methods such as item interactions such as ratings or clicks are deep encoders and transformer-based encoders unknown to a recommender system. allow to embed and represent item reviews as • Spurious Correlations: The problem that some cor- vectors without human intervention. relations between items are only apparent in item reviews and not for instance in ratings. This list is not exhaustive, yet highlights the most popular • Review Ambiguity: The problem that item re- approaches to represent item reviews. We now tend to views can have different meanings depending on the goals pursued by extracting and representing the for instance the reviewer’s personality. information condensed in item reviews. This list is not exhaustive, yet highlights the most popular 2.3. Goals problems that have been addressed by utilizing item re- views for recommendation. We now discuss the research A majority of relevant papers (25 out of 36 papers) aim questions put forth in the introduction. to utilize item reviews for the improvement of recom- mendation performance. Apart from the primary goal of 3. Discussion 3.3. Limitation and Future Direction We first discuss the general benefit of using item reviews We find that representing item reviews as a combination for recommendation. We then focus on the popular use of item aspects and aspect sentiment (see Section 2.2) of item reviews for performance improvement. We close receive particular attention as of late. The field moves with what we conclude to be the main limitations of the towards ever more sophisticated methods that leverage current state-of-the-art and point out a future direction. item reviews. These more sophistaced methods are often simply believed to be superior to traditional methods. Re- search on the advantages or disadvantages of approaches 3.1. General Benefit of Using Item towards item review utilization are rare. Reviews It is unclear whether less popular methods are em- We hold that whether or not item reviews are beneficial ployed and compared against less popular methods be- for recommenation can only be decided by proving the cause they are less effective or whether they are simply following three claims. First, item reviews actually con- believed to be less effective. It would not be the first time tain information useful for recommendation. Second, the that technically sophisticated methods in recommenda- usefulness of an item review can be identified. And third, tion are simply believed to be superior to traditional that useful information can be extracted. Interestingly, methods without properly showing that this is the case it is widely assumed that item reviews contain useful [10]. We argue that it is helpful to study item review information. However, not always do item review-based representations independently from goals and problems. features present useful information [6]. The second and third claims are usually shown by eval- 4. Conclusion uating the effectiveness with which a goal (see Section 2.3) or a problem (see Section 2.4) is addressed by using We address the question if, and under which circum- item reviews. Since the first claim is never established, stances, recommendation benefits from the use of user- we cannot conclude that item reviews are actually ben- generated item reviews. Towards this goal, we identify eficial for recommendation. We can only conclude that and analyze 36 papers that leverage item reviews for item reviews can be beneficial for recommendation, as recommendation published between 2010 and 2022. We underpinned empirically by Sachdeva and McAuley [4]. do not find clear indications in the literature in which Therefore, we cannot clearly answer Research Questions circumstances item reviews can be considered to be con- 2 and 3. We thus have a closer look on the popular goal sistently beneficial for recommendation. of performance improvement using item reviews. The literature clearly shows that utilizing item reviews can be beneficial for recommendation. However, the lit- 3.2. Performance Improvement Using erature fails to show when utilizing item reviews benefits recommendation and why. The widespread belief that Item Reviews using item reviews for recommendation is beneficial per Improved recommendation performance through higher se hampers a deeper understanding of whether or not accuracy would be reached if the recommender systems this belief holds true. The benefit of using item reviews results are better suited to the task at hand due to the remains ambiguous. We therefore argue that the field use of item reviews, meaning lower error rates and better needs to first establish a basic understanding of why and overall evaluation results. Item reviews can be profitably how item reviews can benefit recommendation rather exploited towards this goal. Another measure of perfor- than showing that it potentially can. mance is the robustness of systems. This relates to the question whether there are improvements in the way that typical problems of recommender systems are faced Acknowledgments (see Section 2.4). As discussed above, this is another area The authors would like to thank Alana Diebitsch and where item reviews are commonly utilized. Jan Tovar for their help in collecting and reviewing the Recommender systems achieve higher accuracy and papers that formed the basis of our literature review. robustness from the utilization of item reviews. Gener- ally, researchers exploit item reviews in order to improve the results of existing recommendation models. Recom- References mender systems based only on item reviews are rare, and those which we found are often meant to be embedded [1] S. Aciar, D. Zhang, S. Simoff, J. Debenham, Informed into a larger recommender system. recommender: Basing recommendations on con- sumer product reviews, IEEE Intelligent Systems 22 (2007) 39–47. doi:10.1109/MIS.2007.55. [2] A. Yates, J. Joseph, A.-M. Popescu, A. D. Cohn, proach for recommending useful product reviews, N. Sillick, Shopsmart: Product recommendations Knowledge and Information Systems 26 (2011) 419– through technical specifications and user reviews, 434. doi:10.1007/s10115-010-0287-y. in: Proc. of the 17th ACM Conf. on Information [13] G. Ling, M. R. Lyu, I. King, Ratings meet reviews, and Knowledge Management, ACM, 2008, pp. 1501– a combined approach to recommend, in: Proc. 1502. doi:10.1145/1458082.1458355. of the 8th ACM Conf. on Recommender systems, [3] N. Jakob, S. H. Weber, M. C. Müller, I. Gurevych, Be- ACM, 2014, pp. 105–112. doi:10.1145/2645710. yond the stars: Exploiting free-text user reviews to 2645728. improve the accuracy of movie recommendations, [14] G. Chen, L. Chen, Augmenting service recom- in: Proc. of the 1st Int. CIKM Workshop on Topic- mender systems by incorporating contextual opin- Sentiment Analysis for Mass Opinion, ACM, 2009, ions from user reviews, User Modeling and User- pp. 57—-64. doi:10.1145/1651461.1651473. Adapted Interaction 25 (2015) 295–329. [4] N. Sachdeva, J. McAuley, How useful are reviews for [15] Y. Zhang, Incorporating phrase-level sentiment recommendation? a critical review and potential analysis on textual reviews for personalized rec- improvements, in: Proc. of the 43rd Int. ACM SIGIR ommendation, in: Proc. of the 8th ACM Int. Conf. Conf. on Research and Development in Information on Web Search and Data Mining, ACM, 2015, pp. Retrieval, ACM, 2020, pp. 1845–1848. doi:10.1145/ 435–440. doi:10.1145/2684822.2697033. 3397271.3401281. [16] L. Zheng, V. Noroozi, P. S. Yu, Joint deep modeling [5] G. Penha, C. Hauff, What does bert know about of users and items using reviews for recommenda- books, movies and music? probing bert for con- tion, in: Proc. of the 10th ACM Int. Conf. on Web versational recommendation, in: Proc. of the 14th Search and Data Mining, ACM, 2017, pp. 425–434. ACM Conf. on Recommender Systems, ACM, 2020, doi:10.1145/3018661.3018665. pp. 388–397. doi:10.1145/3383313.3412249. [17] S. Seo, J. Huang, H. Yang, Y. Liu, Interpretable [6] T. Eichinger, Reviews are gold!? on the link be- convolutional neural networks with dual local and tween item reviews and item preferences, in: Joint global attention for review rating prediction, in: Workshop Proc. of the 3rd Edition of Knowledge- Proc. of the 11th ACM Conf. on Recommender aware and Conversational Recommender Systems Systems, ACM, 2017, pp. 297–305. doi:10.1145/ (KaRS) and the 5th Edition of Recommendation in 3109859.3109890. Complex Environments (ComplexRec), CEUR-WS, [18] D. Paul, S. Sarkar, M. Chelliah, C. Kalyan, P. P. Sinai 2021. URL: http://ceur-ws.org/Vol-2960/paper2.pdf. Nadkarni, Recommendation of high quality rep- [7] L. Chen, G. Chen, F. Wang, Recommender systems resentative reviews in e-commerce, in: Proc. of based on user reviews: the state of the art, User the 11th ACM Conf. on Recommender Systems, Modeling and User-Adapted Interaction 25 (2015) ACM, 2017, pp. 311–315. doi:10.1145/3109859. 99–154. doi:10.1007/s11257-015-9155-5. 3109901. [8] S. M. Al-Ghuribi, S. A. Mohd Noah, Multi-criteria [19] C. Musto, M. de Gemmis, G. Semeraro, P. Lops, review-based recommender system–the state of the A multi-criteria recommender system exploiting art, IEEE Access 7 (2019) 169446–169468. doi:10. aspect-based sentiment analysis of users’ reviews, 1109/ACCESS.2019.2954861. in: Proc. of the 11th ACM Conf. on Recommender [9] M. Hernández-Rubio, I. Cantador, A. Bellogín, Systems, ACM, 2017, pp. 321–325. doi:10.1145/ A comparative analysis of recommender sys- 3109859.3109905. tems based on item aspect opinions extracted [20] F. Lahlou, H. Benbrahim, I. Kassou, Textual con- from user reviews, User Modeling and User- text aware factorization machines: Improving rec- Adapted Interaction 29 (2019) 381–441. doi:10. ommendation by leveraging users’ reviews, in: 1007/s11257-018-9214-9. Proc. of the 2nd Int. Conf. on Smart Digital En- [10] M. Ferrari Dacrema, P. Cremonesi, D. Jannach, vironment, ACM, 2018, pp. 64–69. doi:10.1145/ Are we really making much progress? a worry- 3289100.3289111. ing analysis of recent neural recommendation ap- [21] Y. Lu, R. Dong, B. Smyth, Why i like it: Multi- proaches, in: Proc. of the 13th ACM Conf. on task learning for recommendation and explana- Recommender Systems, ACM, 2019, pp. 101—-109. tion, in: Proc. of the 12th ACM Conf. on Rec- doi:10.1145/3298689.3347058. ommender Systems, ACM, 2018, pp. 4–12. doi:10. [11] M. Terzi, M.-A. Ferrario, J. Whittle, Free text in 1145/3240323.3240365. user reviews: Their role in recommender systems, [22] D. Hyun, C. Park, M.-C. Yang, I. Song, J.-T. Lee, in: Workshop on Recommender Systems and the H. Yu, Review sentiment-guided scalable deep rec- Social Web, 2011, pp. 45–48. ommender system, in: The 41st Int. ACM SIGIR [12] R. Zhang, T. Tran, An information gain-based ap- Conf. on Research & Development in Information Retrieval, ACM, 2018, pp. 965–968. doi:10.1145/ for vae-based recommender systems, in: Proc. of 3209978.3210111. the 43rd Int. ACM SIGIR Conf. on Research and [23] P. Bhagat, J. D. Pawar, A comparative study of Development in Information Retrieval, ACM, 2020, feature extraction methods from user reviews for pp. 1269–1278. doi:10.1145/3397271.3401091. recommender systems, in: Proc. of the ACM India [33] H. Liu, W. Wang, H. Xu, Q. Peng, P. Jiao, Neural Joint Int. Conf. on Data Science and Management unified review recommendation with cross atten- of Data, ACM, 2018, pp. 325–328. doi:10.1145/ tion, in: Proc. of the 43rd Int. ACM SIGIR Conf. 3152494.3167982. on Research and Development in Information Re- [24] H. Xia, Z. Wang, B. Du, L. Zhang, S. Chen, G. Chun, trieval, ACM, 2020, pp. 1789–1792. doi:10.1145/ Leveraging ratings and reviews with gating mecha- 3397271.3401249. nism for recommendation, in: Proc. of the 28th [34] D. Antognini, C. Musat, B. Faltings, Interacting with ACM Int. Conf. on Information and Knowledge explanations through critiquing, in: Proc. of the Management, ACM, 2019, pp. 1573–1582. doi:10. 30th Int. Joint Conf. on Artificial Intelligence, 2021, 1145/3357384.3357919. pp. 515–521. doi:10.24963/ijcai.2021/72. [25] G. Alexandridis, T. Tagaris, G. Siolas, A. Stafy- [35] T. K. Aslanyan, F. Frasincar, Utilizing textual re- lopatis, From free-text user reviews to product views in latent factor models for recommender sys- recommendation using paragraph vectors and ma- tems, in: Proc. of the 36th Annual ACM Symposium trix factorization, in: Companion Proc. of The 2019 on Applied Computing, ACM, 2021, pp. 1931–1940. World Wide Web Conf., ACM, 2019, pp. 335–343. [36] I. Kostric, K. Balog, F. Radlinski, Soliciting user pref- doi:10.1145/3308560.3316601. erences in conversational recommender systems via [26] J. Ni, J. Li, J. McAuley, Justifying recommendations usage-related questions, in: Proc. of the 15th ACM using distantly-labeled reviews and fine-grained Conf. on Recommender Systems, ACM, 2021, pp. aspects, in: Proc. of the 2019 Conf. on Empirical 724–729. doi:10.1145/3460231.3478861. Methods in Natural Language Processing and the [37] C. Lin, X. Liu, G. Xv, H. Li, Mitigating sentiment 9th Int. Joint Conf. on Natural Language Processing bias for recommender systems, in: Proc. of the (EMNLP-IJCNLP), ACL, 2019, pp. 188–197. doi:10. 44th Int. ACM SIGIR Conf. on Research and Devel- 18653/v1/D19-1018. opment in Information Retrieval, ACM, 2021, pp. [27] G. Wu, K. Luo, S. Sanner, H. Soh, Deep language- 31–40. doi:10.1145/3404835.3462943. based critiquing for recommender systems, in: Proc. [38] X. Wang, I. Ounis, C. Macdonald, Leveraging review of the 13th ACM Conf. on Recommender Systems, properties for effective recommendation, in: Proc. ACM, 2019, pp. 137–145. doi:10.1145/3298689. of the Web Conf. 2021, ACM, 2021, pp. 2209–2219. 3347009. doi:10.1145/3442381.3450038. [28] D. Rafailidis, F. Crestani, Adversarial training for[39] S. Pan, D. Li, H. Gu, T. Lu, X. Luo, N. Gu, Accu- review-based recommendations, in: Proc. of the rate and explainable recommendation via review 42nd Int. ACM SIGIR Conf. on Research and Devel- rationalization, in: Proc. of the ACM Web Conf. opment in Information Retrieval, ACM, 2019, pp. 2022, ACM, 2022, pp. 3092–3101. doi:10.1145/ 1057–1060. doi:10.1145/3331184.3331313. 3485447.3512029. [29] A. Salah, Q.-T. Truong, H. W. Lauw, Cornac: A com- [40] Y. Zhang, W. Zuo, Z. Shi, B. K. Adhikari, Integrating parative framework for multimodal recommender reviews and ratings into graph neural networks for systems, JMLR 21 (2020) 1–5. URL: http://jmlr.org/ rating prediction, Journal of Ambient Intelligence papers/v21/19-805.html. and Humanized Computing (2022). doi:10.1007/ [30] F. J. Peña, D. O’Reilly-Morgan, E. Z. Tragos, N. Hur- s12652-021-03626-7. ley, E. Duriakova, B. Smyth, A. Lawlor, Combin- ing rating and review data by initializing latent factor models with topic models for top-n recom- mendation, in: Proc. of the 14th ACM Conf. on A. Appendix Recommender Systems, ACM, 2020, pp. 438–443. We present a tabular overview of our categorization of doi:10.1145/3383313.3412207. the 36 papers we find relevant for the convenience of [31] J. P. Zhou, Z. Cheng, F. Perez, M. Volkovs, the gentle reader. Tafa: Two-headed attention fused autoencoder for context-aware recommendations, in: Proc. of the 14th ACM Conf. on Recommender Systems, ACM, 2020, pp. 338–347. doi:10.1145/3383313. 3412268. [32] K. Luo, H. Yang, G. Wu, S. Sanner, Deep critiquing Table 1 List of relevant papers analyzed in the literature review sorted by year of publication. Full dots mark that a an item in either of the three categories Review Representations (see Section 2.2), Goals (see Section 2.3), and Problems (see Section 2.4) have been addressed in a paper. Numbers in parentheses indicate the overall number of occurrences of an item. Review Representations Goals Problems Recommendation Explanation (9) Term-based User/Item Profile (8) Performance Improvement (25) Spurious Correlations (1) Contextual Opinion (1) Auxiliary Properties(3) Review Embedding (7) Aspect Sentiment (14) Review Ambiguity (1) Context Inference (4) Review Ranking (4) Frequent Terms (4) Novel Systems (3) Item Aspects (20) De-Biasing (3) Cold-Start (4) Keywords (2) Sparsity (5) Paper Year Terzi et al. [11] 2010 Zhang and Tran [12] 2011 Ling et al. [13] 2014 Chen et al. [7] 2015 Chen and Chen [14] Zhang [15] Zheng et al. [16] 2017 Seo et al. [17] Paul et al. [18] Musto et al. [19] Lahlou et al. [20] 2018 Lu et al. [21] Hyun et al. [22] Bhagat and Pawar [23] Hernández-Rubio et al. [9] 2019 Xia et al. [24] Alexandridis et al. [25] Al-Ghuribi and Noah [8] Ni, Jianmo, et al. [26] Wu, Ga, et al. [27] Rafailidis and Crestani [28] Salah et al. [29] 2020 Sachdeva and McAuley [4] Penha and Hauf [5] Peña et al.[30] Zhou et al. [31] Luo et al. [32] Liu et al. [33] Antognini et al. [34] 2021 Eichinger [6] Aslanyan and Frasincar [35] Kostric et al. [36] Lin et al. [37] Wang et al. [38] Pan et al. [39] 2022 Zhang et al. [40]