Reviews Are Gold!? On the Link between Item Reviews and Item Preferences Tobias Eichinger1 1 Technical University of Berlin, Straße des 17. Juni 135, Berlin, 10623, Germany Abstract User-user similarities in recommender systems are traditionally assessed on co-rated items. As ratings encode item prefer- ences, similarities on co-rated items capture similarities in item preferences. However, a majority of similarities are undefined as particularly small profiles seldom overlap. We propose to use a similarity measure based on users’ item reviews in order to estimate similarities in item preferences in the absence of co-rated items. Although it is commonly believed that item reviews are descriptive of a user’s item preferences, it is not clear whether indeed and what about a user’s item preferences item reviews describe. We present empirical results indicating that the proposed review-based similarity measure captures features in users’ item preferences that are different from those captured on co-rated items. Astonishingly, we find that 10 keywords of a user’s item reviews suffice to represent a user’s item preferences. Independently, we argue that the proposed review-based similarity measure is particularly suitable for use in decentralized recommender systems for three design prop- erties. First, it can be calculated between any pair of users who hold item reviews. Second, it can be calculated bilaterally without involvement of a third party. And third, it does not require to reveal a user’s plain review text. Keywords review-based similarity, word mover’s distance, word embedding, fasttext, keyword extraction, YAKE, 1. Introduction Exact similarity estimates can be obtained through cryp- tographic methods bilaterally [7, 8], or with the help of a We denote by scarcity the situation that only a small third-party [9]. Despite the feasibility to compute similar- subset of rows in the user-item matrix are available for ities in a privacy-preserving fashion, the problem remains recommendation. Scarcity is commonly encountered in that similarity measures usually require that some items decentralized recommender systems in which users only are rated by both users. Such items are typically denoted have access to a small subset of other users. Scarcity is co-rated items. As user-item matrices are commonly very often considered beneficial for user privacy [1, 2], yet sparse, similarity measures based on co-rated items are detrimental to recommendation performance. Sharing undefined for a majority of user-user pairs. This circum- rating profiles in order to alleviate scarcity is problematic stance is exacerbated under scarsity. as rating profiles are often considered personal data, and We propose to calculate similarities between users on sensitive as such. A compromise is commonly made by the basis of their item reviews instead of co-rated items. only sharing ratings with similar users, where similarity We particularly address scenarios in which sparsity is measured with respect to item preference. As similari- meets scarcity. Reviews are commonly believed to be ties are traditionally calculated on users’ item ratings, it descriptive of a user’s item preferences. This belief is is not trivial to find users with similar item preferences supported by reports on the success of state-of-the-art- without sharing one’s item ratings. algorithms that leverage reviews [10, 11]. However, Approximate and exact methods have been proposed a recent review by Sachdeva and McAuley [12] puts to calculate the similarity between users on ratings with- this belief into question. Their findings indicate that out revealing them. Lathia [3] proposes an approximate state-of-the-art algorithms that leverage item reviews do similarity measure that does not require to disclose nei- not consistently outperform even simple baselines that ther the rated items nor their actual ratings. Other ap- do not. This inconsistency raises the question whether proximate methods include profile obfuscation [4, 5, 6]. state-of-the-art methods are in fact able to reliably extract information from reviews that is beneficial for 3rd Edition of Knowledge-aware and Conversational Recommender recommendation. Reviews remain complex inputs to Systems (KaRS) & 5th Edition of Recommendation in Complex recommendation algorithms that seem to defy current Environments (ComplexRec) Joint Workshop @ RecSys 2021, endeavors to extract users’ item preferences reliably. September 27–1 October 2021, Amsterdam, Netherlands " tobias.eichinger@tu-berlin.de (T. Eichinger) ~ https://www.snet.tu-berlin.de/menue/team/tobias_eichinger/ In order to better understand what about users’ item (T. Eichinger) preferences is reflected in reviews, we propose a simi-  0000-0002-8351-2823 (T. Eichinger) larity measure that compares users on the basis of their © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). item reviews. We report findings on our pilot experi- CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) u v 1. Concatenate Reviews + +…+ = + +…+ = 2. Drop Stop Words “awesome” : 0.65 “performance” : 0.8 3. Extract Weighted Keywords (YAKE) “usability” : 0.2 “functional” : 0.35 [0.11, 0.59,.., 0.01] : 0.8 4. Map Keywords to Word Vectors s: [0.51, -0.04,.., 0.91] : 0.6 t: [1.12, 0.46,.., -0.85] : 0.4 [3.41, 0.66,.., -0.33] : 0.2 5. Calculate Word Mover’s Distance (WMD) WMD(s,t) 6. Transform WMD to Similarity Measure simYAKE & WMD(u,v) Figure 1: Similarity comparison between users 𝑢 and 𝑣 on the basis of their item reviews. The comparison procedure follows a six-step approach based on [13]. ments indicating that the proposed similarity measure al. originally proposed keyword extraction on the basis (a) indeed captures similarity in users’ item preferences, of tf-idf features. In contrast to the original work, we and (b) captures features that are different from those instead apply a state-of-the-art keyword extractor, which captured by co-rated-items-based similarity measures. additionally allows users to run keyword extraction in- Independently from the above results, we find that the dependently from other users. design of the proposed review-based similarity measure motivates its use in decentralized recommender systems 2.1. From Document Distance to Review for three design properties. First, it can be calculated between any pair of users who hold item reviews. Second, Similarity it can be calculated bilaterally without involvement of 4. Map Keywords to Word Vectors (Figure 1): Kusner et a third party. And third, it does not require to reveal a al. [14] propose the Word Mover’s Distance (WMD), a user’s plain review text. distance metric between text documents that are each represented by a subset of their words.1 The WMD is made such that text documents that hold semantically 2. Concept similar words – and thus not necessarily the same We follow along the lines of the user-user similarity mea- words – are close. Semantic similarity between words sure proposed by Eichinger et al. [13]. It has originally is captured by word embeddings. Word embeddings been proposed as a general-purpose similarity measure map words to word vectors such that word vectors of on texting data. In the paper at hand, we instead apply semantically similar words are close. Words need not it to item reviews and show that it particularly captures necessarily be keywords. Note that all users need to use similarity in users’ item preferences. the same word embedding model, wherefore we use a Similarity comparison can be summarized as a six-step publicly available pre-trained word embedding model. approach as shown in Figure 1. We first elaborate on Steps 4.-6. in Section 2.1, which constitute the core of 5. Calculate Word Mover’s Distance (Figure 1): The WMD the similarity measure. Afterwards in Section 2.2, we leverages word vectors that condense semantic similarity focus on optional steps such as text preprocessing and 1 The WMD is more broadly known as Earth Mover’s Distance keyword extraction comprising Steps 1.-3. Eichinger et (EMD), where the EMD is in turn a special Wasserstein metric. between single words, in order to measure semantic WMD has not found wide adoption. Efforts to lower the similarity between sets of words. More precisely, the computational complexity include approximation [16, 17] WMD compares so-called signatures.2 Signatures are and the reduction of the signature size by keyword ex- sets of word vectors in which every word vector is traction [13]. In a previous paper, we applied keyword associated with a word weight. The number of word extraction on the basis of the tf-idf word relevance mea- vectors in a signature is called the signature size. The sure [13]. Note that keyword extraction via tf-idf requires distance between two signatures, associated with to keep track of the global usage of terms in all users’ the distance between two text documents, is then reviews. A more convenient alternative is Yet Another determined by solving a transportation problem (see Keyword Extractor (YAKE) by Campos et al. [18, 19]. [14] for details). The WMD can be calculated bilaterally Their keyword extractor is document-based and works and independently of other users upon the exchange of on textual features of single documents. It does not re- signatures. quire information on other documents. YAKE is a weighted keyword extractor.5 It attaches 6. Transform WMD to Similarity Measure (Figure 1): We positive keyword weights 𝑔𝑖 > 0 to every keyword 𝑤𝑖 of transform the WMD distance metric into a similarity a text document. Keywords in YAKE are considered more measure. Note that the WMD distance between two simi- important in describing their underlying text document lar text documents is close to zero, whereas dissimilar text the smaller their associated keyword weights are. documents may yield arbitrarily large WMD distances. Conversely, WMD word weights are considered more Hence, we first limit the co-domain to WMD(𝑠, 𝑡) ∈ important the larger they are. We therefore reverse the [0, 2] for any pair of signatures 𝑠 and 𝑡. We do so by order of the keyword weights 𝑔𝑖 for use as word weights using the cosine distance3 to measure distances between in the WMD. We do so via the linear transformation word vectors and normalize word weights in a signature defined by 𝑔𝑖 := 𝑔𝑚𝑎𝑥 + 𝑔𝑚𝑖𝑛 − 𝑔𝑖 ∈ [𝑔𝑚𝑖𝑛 , 𝑔𝑚𝑎𝑥 ], and such that they sum to 1.4 We then obtain a similarity consecutive normalization of the word weights such that measure upon the following linear transformation: they sum to 1, where 𝑔𝑚𝑖𝑛 and 𝑔𝑚𝑎𝑥 are the minimum and maximum keyword weights respectively. 1 𝑠𝑖𝑚WMD (𝑠, 𝑡) := 1 − WMD(𝑠, 𝑡) ∈ [0, 1]. 2 Applying YAKE in conjunction with the above weight The signature size is the sole hyperparameter of the transformation on a user’s item reviews yields signatures WMD, and thus also of the associated similarity mea- that serve as input to the WMD. We denote by YAKE(𝑢) sure 𝑠𝑖𝑚WMD . We will specify the signature size where the thus associated signature of some user 𝑢’s item re- required, yet omit it in the notation for reasons of brevity. views. Combining this with the results of Section 2.1, we can now define the review-based similarity measure as 2.2. Key Word Extraction 𝑠𝑖𝑚review (𝑢, 𝑣) := 𝑠𝑖𝑚WMD (YAKE(𝑢), YAKE(𝑣)), 1. Concatenate Reviews (Figure 1): In order to arrive at where 𝑢 and 𝑣 are some users that hold item reviews. a user-specific text document, we first concatenate all item reviews authored by a user in arbitrary order with blanks between reviews. 3. Evaluation 2. Drop Stop Words (Figure 1): We drop stop words as a We present results that contrast the cosine similarity as basic text preprocessing step. We do not perform any a traditional co-rated-items-based similarity measure further preprocessing in order to mitigate the impact with the proposed review-based similarity measure due to preprocessing on the evaluation of the proposed 𝑠𝑖𝑚review . We emphasize that the goal we pursue by review-based similarity measure. this comparison is not to argue that the review-based similarity measure is superior to co-rated-items-based 3. Extract Weighted Keywords (Figure 1): The computa- similarity measures. Instead, it is our goal to find a solid tional complexity of the WMD is often prohibitive as indication that the review-based similarity measure it is supercubic in the signature size. For this reason, the indeed captures similarity in terms of item preferences between users. We employ the rationale, that if the 2 The term signature has been coined by Rubner et al. [15] in review-based similarity measure captures similarity the domain of computer vision as abstractions of color histograms. 3 <𝑢,𝑣> 𝑑cos (𝑢, 𝑣) = 1 − ‖𝑢‖·‖𝑣‖ , where < · , · > denotes the dot 5 YAKE also extracts keyphrases. However, we only consider product and ‖·‖ the Euclidean norm. keywords and omit treatment of keyphrases for reasons of simplic- 4 If the Euclidean distance is preferred, we can alternatively nor- ity. Although it is also possible to convert keyphrases into vectors, malize vectors to length 1 and normalize word weights such that distinct scientific reasoning is required to justify a comparison be- they sum to 1. tween signatures that combine word vectors and phrase vectors. Table 1 Descriptive Statistics of the data sets High-100, Median-100, Low-100, and Mix-100. *: Signature size between 1 and 100 that produces the best RMSE on user-based CF as per Equation (1). Tie-breakers are resolved by choice of the smallest signature size. **: Percentage of user pairs that have co-rated items in the training set. Head-100 Median-100 Tail-100 Mix-100 unique users 100 100 100 100 unique items 180, 981 797 499 70, 213 ratings/reviews 278, 927 800 500 83, 714 optimal signature size* 5 – – 10 pairs with co-rated items** 81.82% 0.03% 0.01% 8.77% between users’ item preferences, then it necessarily must sample. Some descriptive statistics are shown in Table perform well in user-based Collaborative Filtering (CF). 1. Note that, if similarity is measured on the basis of co-rated items, only 0.03% and 0.01% of all pairwise We calculate 𝑠𝑖𝑚review with the help of the follow- similarites can be calculated for users in the Median- ing software contributions.6 We use Pele and Werman’s 100 and Low-100 data sets respectively. As we compare Python implementation of the WMD [20].7 We use Bo- review-based with co-rated-items-based similarity mea- janowski et al.’s publicly available pre-trained fasttext sures, we omit an analysis on the samples Median-100 word embedding model cc.en.300.bin [21].8 We use the and Low-100 as they simply provide too little ground for stop word list provided by Bird’s The Natural Language comparison. Toolkit (NLTK) [22]. Finally, we use Campos et al.’s Python library yake [18].9 3.2. Baselines Splits into training and test sets are at a ratio of 80 to 20, where particularly every user’s entries are split into We apply the following standard mean-centered rating portions of training and test entries. We report average estimation equation for user-based CF: results over 5 distinct training-test splits on the usual ∑︁ 𝑠𝑖𝑚(𝑢, 𝑣)(𝑟𝑣,𝑖 − ¯ Root Mean Squared Error (RMSE) accuracy metric. 𝑟𝑣 ) ˆ 𝑟𝑢,𝑖 = ¯ 𝑟𝑢 + ∑︀ , (1) 𝑣∈𝑁 𝑣∈𝑁𝑢,𝑖 𝑠𝑖𝑚(𝑢, 𝑣) 𝑢,𝑖 3.1. Data Sets where ˆ 𝑟𝑢,𝑖 denotes an estimated rating for user 𝑢 We present results on two small samples of the Amazon on item 𝑖, ¯𝑟𝑢 the mean rating of user 𝑢, 𝑁𝑢,𝑖 some Reviews 5-core (2014) data set [23].10 The original data set neighborhood of users of user 𝑢 that have rated item 𝑖, holds roughly 41 million entries on 24 product domains, 𝑠𝑖𝑚 a user-user similarity measure, and 𝑟𝑣,𝑖 the true where every user has at least 5 rating-review pairs. The rating of user 𝑣 on item 𝑖. For reasons of brevity we say two samples considered in the paper at hand cover two that a similarity measure outperforms another, when in distinct scenarios of (a) an artificially high and (b) a more fact we mean that rating estimation as per Equation (1) realistic density. We now describe their construction. equipped with the one similarity measure outperforms We draw the first data set Head-100 by selecting the that equipped with the other. 100 largest user profiles. It simulates an artificially high density of ratings and reviews with particularly large We propose two similarity measures as baselines for amounts of review text per user. As for the second data comparison with 𝑠𝑖𝑚review . First, cosine similarity as a set Mix-100, we draw 2 additional data sets Median-100 similarity measure based on mutually rated items. And and Tail-100, of medium and low density, by selecting second, a simple arithmetic mean having equal similarity the profiles of 100 median and 100 tail users respectively. weights 𝑠𝑖𝑚mean (𝑢, 𝑣) = 1/|𝑁𝑢,𝑖 | for all users 𝑣 ∈ 𝑁𝑢,𝑖 . We finally sample Mix-100 from the datasets Head-100, If 𝑠𝑖𝑚review outperforms 𝑠𝑖𝑚mean , it is an indication that Median-100, and Tail-100 at a ratio of 33 to 34 to 33. We 𝑠𝑖𝑚review does capture similarity in item preference be- construct Mix-100 in this way in order to guarantee the tween users, that is more than an estimate without prior presence of large, medium-sized, and small profiles in the knowledge on reviews. If further 𝑠𝑖𝑚review outperforms 𝑠𝑖𝑚cosine , it is an indication that the review-based similar- 6 https://github.com/TEichinger/WMDtestbed ity measure captures similarity in item preference at least 7 https://pypi.org/project/pyemd/ on a par with co-rated-items-based similarity measures. 8 https://fasttext.cc/docs/en/pretrained-vectors.html 9 https://pypi.org/project/yake/ 10 https://doi.org/10.7910/DVN/V7X3VE Head-100 Mix-100 600 review cosine 500 400 frequency 300 200 100 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 similarity similarity Figure 2: Comparison of histograms of pairwise similarities for the review-based similarity measure 𝑠𝑖𝑚review and the cosine similarity 𝑠𝑖𝑚cosine for the training sets Head-100 and Mix-100. 3.3. Capture Similarity in Item both High-100 and Mix-100. In contrast, 𝑠𝑖𝑚review ’s dis- Preference on Item Reviews tribution follows the shape of a normal distribution in High-100, and a mix of normal distributions with distinct We present findings that indicate that the review-based modes in Mix-100. Review-based similarities in High-100, similarity measure 𝑠𝑖𝑚review captures similarity in item Median-100, and Low-100 seem to have distinct modes preference. We first constrast the statistical properties of that interfere in Mix-100. 𝑠𝑖𝑚review and 𝑠𝑖𝑚cosine , where we assume that 𝑠𝑖𝑚cosine already captures some similarity in item preference. We Table 2 then measure their respective impact on rating estimation Recommendation accuracy (RMSE) over various minimum as per Equation (1), acting in the role of (a) similarity neighborhood sizes 𝑛min = min|𝑁𝑢,𝑖 | on the Head-100 and weights, and (b) a neighborhood selection criterion. In 𝑢,𝑖 Mix-100 data sets. Bold values are per-column best values, order to study the impact due to (a) and (b) individually, where asterisks indicate statistical significance of paired 𝑡- we first omit neighborhood selection by setting 𝑁𝑢,𝑖 as tests to the alternatives (𝛼 = 0.05). Hyphens indicate that no the set of all other users 𝑣 ̸= 𝑢 and applying 𝑠𝑖𝑚review ratings could be estimated due to insufficiently small neigh- similarity weights, and then conversely omit similarity- borhood size. We see that weighing ratings via 𝑠𝑖𝑚review based weighted averaging by using 𝑠𝑖𝑚mean similarity in Equation (1) consistently outperforms the baselines for weights and setting 𝑁𝑢,𝑖 as the set of 𝑘 users that are 𝑛min ≤ 5 on average. most similar to user 𝑢 with respect to 𝑠𝑖𝑚review and have Head-100 rated item 𝑖. 𝑛min 1 2 3 5 10 3.3.1. Statistical Properties 𝑠𝑖𝑚review 0.962 0.911 0.899 0.903 0.880 𝑠𝑖𝑚cosine 0.968 0.923 0.905 0.904 0.877 The similarity measures 𝑠𝑖𝑚review and 𝑠𝑖𝑚cosine capture 𝑠𝑖𝑚mean 0.962 0.912 0.900 0.903 0.880 distinct aspects of similarity between users’ item pref- erences. We find that 𝑠𝑖𝑚review and 𝑠𝑖𝑚cosine are only Mix-100 weakly positively correlated with respect to the Spear- 𝑛min 1 2 3 5 10 man rank correlation (0.36 on High-100, and 0.25 on 𝑠𝑖𝑚review 1.084 1.019* 0.985* 0.905 – Mix-100). If in contrast both similarity measures were 𝑠𝑖𝑚cosine 1.087 1.033 0.997 0.932 – strongly positively correlated, this would indicate that 𝑠𝑖𝑚mean 1.085 1.022 0.988 0.988 – both similarity measures capture similar aspects of simi- larity. In that case, we would also expect that both similar- ity measures yield similar recommendation performance. We further find that 𝑠𝑖𝑚review and 𝑠𝑖𝑚cosine produce 3.3.2. Similarity-based Weighted Averaging distinct similarity distributions. Figure 2 shows the sim- ilarity distributions in the data sets High-100 and Mix- We find that 𝑠𝑖𝑚review similarity weights provide signif- 100. We observe that the cosine similarity’s distribu- icantly better recommendation performance, if profiles tion follows the shape of an exponential distribution in are not exclusively large such as in Mix-100. If in con- Head-100 (nmin = 10) Mix-100 (nmin = 5) 1.2 mean review 1.0 cosine 0.8 RMSE 0.6 0.4 0.2 0.0 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 k most similar profiles k most similar profiles Figure 3: Recommendation accuracy (RMSE) on the 𝑘 most similar profiles on the Head-100 and Mix-100 data sets on 𝑠𝑖𝑚mean similarity weights. Error bars show ± one standard deviation. For 𝑘 = 10 in Head-100, no ratings could be estimated due to a lack of ratings. trast, profiles are exclusively large such as in Head-100, Item Reviews for Item Recommendation: There 𝑠𝑖𝑚review similarity weights, 𝑠𝑖𝑚cosine similarity weights, is a wealth of work that aims to leverage item re- and 𝑠𝑖𝑚mean similarity weights perform similarly as views in order to improve recommendation performance. shown in Table 2. None performs significantly better on Sachdeva and McAuley [12] recently presented a review Head-100. On Mix-100 however, 𝑠𝑖𝑚review significantly of state-of-the-art recommender algorithms that leverage outperforms the alternatives for 𝑛min ∈ {2, 3}. For review data. They categorize them into two tracks. First, 𝑛min = 5, superiority is not statistically significant de- algorithms that use reviews for regularization at algo- spite the large absolute margin due to 𝑠𝑖𝑚review ’s high rithm training time [24, 25]. And second, algorithms that empirical standard deviation. use review-based features for use at recommendation time [26, 10, 11, 25, 27, 28, 29]. In the paper at hand, we 3.3.3. Similarity-based Profile Selection propose a review-based similarity measure as a feature that captures similarity in users’ item preferences. We We find that performing rating estimation on only the 𝑘 thus contribute to the second category. most similar user profiles based on 𝑠𝑖𝑚review outperforms Estimating Similarity in Item Preference with- both 𝑠𝑖𝑚cosine and 𝑠𝑖𝑚mean on average. More concretely, out Using Ratings: Similarity in item preference can for we see in Figure 3 that 𝑠𝑖𝑚review and 𝑠𝑖𝑚cosine perform instance be estimated on the basis of the shared context similarly for 𝑘 ≥ 40 on both Head-100 and Mix-100. For of users. Wainakh et al. [2] show that users who share a 𝑘 ≤ 30, we see that decreasing 𝑘 simulatenously yield de- social context also tend to share item preferences. More creasing RMSE values on Head-100. On Mix-100, only the precisely, they show that profiles sampled from users RMSE of 𝑠𝑖𝑚review decreases for decreasing 𝑘, while the close in the social graph provide better recommendation RMSE of 𝑠𝑖𝑚cosine essentially stays the same. This is due accuracy on an association rules mining algorithm as to the fact that many pairwise 𝑠𝑖𝑚cosine values are unde- compared to uniformly randomly sampled profiles. de fined such that increasing 𝑘 does not yield larger neigh- Spindler et al. [30] propose to use geo-temporal context borhoods 𝑁𝑢,𝑖 . We observe further that RMSE mean between users as a proxy to elicit mutual item preferences values tend to decrease with decreasing parameter val- in opportunistic networking scenarios. ues 𝑘, whereas RMSE standard deviations increases with Alternative Keyword Extractors and Word Em- decreasing parameter values 𝑘. beddings: The literature proposes a large spectrum of keyword extractors and word embedding models. We ap- 4. Related Work ply YAKE as a state-of-the-art keyword extractor [18, 19]. It runs on single documents rather than a corpus of doc- We find related work on the following three aspects. First, uments. Keyword extraction can thus be performed by leveraging review text for recommendation in general. users individually. An alternative that also runs on single Second, estimating similarity in item preference without documents is RAKE [31]. A majority of keyword extrac- using ratings. And third, alternatives to the proposed tors require a document corpus for keyword extraction YAKE keyword extractor and fasttext word embedding [32, 33, 34]. models for use in 𝑠𝑖𝑚review . We apply a fasttext word embedding model since it can map word tokens that have not been seen at training time measures, in: Proceedings of the 1st ACM Con- by leveraging subword information [21]. An alternative ference on Recommender Systems, ACM, 2007, pp. that also leverages subword information is for instance 1–8. doi:10.1145/1297231.1297233. LexVec [35]. A majority of word embedding models does [4] S. Berkovsky, T. Kuflik, F. Ricci, The impact of data not leverage subword information and can thus only map obfuscation on the accuracy of collaborative filter- word tokens available at training time [36, 37, 38, 39]. ing, Expert Systems with Applications 39 (2012) 5033–5042. doi:10.1016/j.eswa.2011.11.037. [5] R. Parameswaran, D. M. Blough, Privacy preserving 5. Conclusion collaborative filtering using data obfuscation, in: Proceedings of the 3rd IEEE International Confer- We propose a review-based user-user similarity measure ence on Granular Computing, 2007, pp. 380–380. that presents an alternative to traditional co-rated-items- doi:10.1109/GrC.2007.133. based similarity measures. It is particularly suitable, if [6] H. Polat, W. Du, Privacy-preserving collabora- two users do not have co-rated items and would thus tive filtering using randomized perturbation tech- default to an undefined user-user similarity. Similarities niques, in: Proceedings of the 3rd IEEE Interna- can now be calculated on the basis of item reviews instead tional Conference on Data Mining, 2003, pp. 625– of co-rated items. 628. doi:10.1109/ICDM.2003.1250993. We find that the proposed review-based similarity mea- [7] M. Alaggan, S. Gambs, A.-M. Kermarrec, Pri- sure captures similarity in users’ item preferences. Inter- vate similarity computation in distributed sys- estingly, the proposed review-based similarity measure tems: from cryptography to differential privacy, captures different features from those captured on co- in: OPODIS, volume 7109 of LNCS, Springer rated items. The difference can be linked implicitly to the Berlin Heidelberg, 2011, pp. 357–377. doi:10.1007/ difference in their statistical features such as similarity 978-3-642-25873-2_25. distribution and Spearman rank correlation. However, [8] D. Yang, C. Lin, B. Yang, A novel secure cosine the difference cannot be characterized explicitly as the similarity computation scheme with malicious ad- review-based similarity measure is based on unsuper- versaries, International Journal of Network Security vised word embeddings. More precisely, word embed- & Its Applications 5 (2013) 171–178. doi:10.5281/ dings find semantic similarity between words, yet do not zenodo.4032143. tell how and in which sense the words are similar. [9] J. Zhang, S. Hu, Z. L. Jiang, Privacy-preserving We conclude that the proposed review-based user-user similarity computation in cloud-based mobile social similarity measure presents a promising feature for rec- networks, IEEE Access 8 (2020) 111889–111898. ommender system design, when item reviews are avail- doi:10.1109/ACCESS.2020.3003373. able. We do not argue that the proposed review-based [10] F. Chen, Z. Dong, Z. Li, X. He, Federated meta- similarity measure is in any sense superior to co-rated- learning for recommendation, 2018. URL: http:// items-based similarity measures. On the contrary, our arxiv.org/abs/1802.07876. findings indicate that they are complementary in model- [11] Y. Tay, A. T. Luu, S. C. Hui, Multi-pointer co- ing users’ item preferences. attention networks for recommendation, in: Pro- ceedings of the 24th ACM SIGKDD International References Conference on Knowledge Discovery & Data Min- ing, ACM, 2018, pp. 2309–2318. doi:10.1145/ [1] M. Larson, A. Zito, B. Loni, P. Cremonesi, To- 3219819.3220086. wards minimal necessary data: The case for ana- [12] N. Sachdeva, J. McAuley, How useful are reviews lyzing training data requirements of recommender for recommendation? a critical review and poten- algorithms, in: Proceedings of the 1st FATREC tial improvements, in: Proceedings of the 43rd Workshop @ RecSys, 2017, pp. 1–6. doi:10.18122/ International ACM SIGIR Conf. on R&D in In- B2VX12. formation Retrieval, ACM, 2020, pp. 1845–1848. [2] A. Wainakh, T. Grube, J. Daubert, M. Mühlhäuser, doi:10.1145/3397271.3401281. Efficient privacy-preserving recommenda- [13] T. Eichinger, F. Beierle, S. U. Khan, R. Middela- tions based on social graphs, in: Proceed- nis, V. Sekar, S. Tabibzadeh, affinity: A system ings of the 13th ACM Conference on Rec- for latent user similarity comparison on texting ommender Systems, ACM, 2019, pp. 78–86. data, in: Proceedings of the 53rd IEEE Int. Conf. doi:10.1145/3298689.3347013. on Comm., IEEE, 2019, pp. 1–7. doi:10.1109/ICC. [3] N. Lathia, S. Hailes, L. Capra, Private distributed 2019.8761051. collaborative filtering using estimated concordance [14] M. Kusner, Y. Sun, N. Kolkin, K. Weinberger, From word embeddings to document distances, in: Pro- ceedings of the 32nd Int. Conf. on Machine Learn- [25] R. Catherine, W. Cohen, Transnets: Learning to ing, volume 37, JMLR.org, 2015, pp. 957–966. doi:10. transform for recommendation, in: Proceedings 5555/3045118.3045221. of the 11th ACM Conference on Recommender [15] Y. Rubner, C. Tomasi, L. Guibas, A metric for dis- Systems, ACM, 2017, pp. 288 – 296. doi:10.1145/ tributions with applications to image databases, in: 3109859.3109878. Proceedings of the 6th IEEE International Confer- [26] L. Zheng, V. Noroozi, P. S. Yu, Joint deep modeling ence on Computer Vision, IEEE, 1998, pp. 59–66. of users and items using reviews for recommen- doi:10.1109/ICCV.1998.710701. dation, in: Proceedings of the 10th ACM Interna- [16] M. Cuturi, Sinkhorn distances: Lightspeed compu- tional Conference on Web Search and Data Mining, tation of optimal transport, Advances in Neural ACM, 2017, pp. 425–434. doi:10.1145/3018661. Information Processing Systems 26 (2013) 2292– 3018665. 2300. URL: https://papers.nips.cc/paper/2013/hash/ [27] Y. Bao, H. Fang, J. Zhang, Topicmf: Simultaneously af21d0c97db2e27e13572cbf59eb343d-Abstract. exploiting ratings and reviews for recommendation, html. in: Proceedings of the 28th AAAI Conference on [17] K. Atasu, T. Mittelholzer, Linear-complexity data- Artificial Intelligence, AAAI Press, 2014, pp. 2–8. parallel earth mover’s distance approximations, doi:10.5555/2893873.2893874. in: Proceedings of the 36th International Confer- [28] K. Bauman, A. Tuzhilin, Discovering contex- ence on Machine Learning, volume 97, PMLR, 2019, tual information from user reviews for recom- pp. 364–373. URL: http://proceedings.mlr.press/v97/ mendation purposes., in: Proceedings of the atasu19a.html. 1st CBRecSys Workshop @RecSys, CEUR-WS, [18] R. Campos, V. Mangaravite, A. Pasquali, A. Jorge, 2014, pp. 2–9. URL: http://ceur-ws.org/Vol-1245/ C. Nunes, A. Jatowt, Yake! keyword extraction cbrecsys2014-paper01.pdf. from single documents using multiple local features, [29] P. G. Campos, N. Rodríguez-Artigot, I. Cantador, Ex- Information Sciences 509 (2020) 257–289. doi:10. tracting context data from user reviews for recom- 1016/j.ins.2019.09.013. mendation: A linked data approach, in: In Proceed- [19] R. Campos, V. Mangaravite, A. Pasquali, A. Jorge, ings of the 1st ComplexRec Workshop @RecSys, C. Nunes, A. Jatowt, Yake! collection- CEUR-WS, 2017, pp. 14–18. URL: http://ceur-ws. independent automatic keyword extractor, in: Ad- org/Vol-1892/paper3.pdf. vances in Information Retrieval, volume 10772 of [30] A. De Spindler, M. C. Norrie, M. Grossniklaus, Col- LNCS, Springer, 2018, pp. 806–810. doi:10.1007/ laborative filtering based on opportunistic informa- 978-3-319-76941-7_80. tion sharing in mobile ad-hoc networks, in: Pro- [20] O. Pele, M. Werman, Fast and robust earth mover’s ceedings of the 2007 OTM Conf. Int. Conf., Springer distances, in: Proceedings of the 12th IEEE Interna- Berlin Heidelberg, 2007, pp. 408–416. doi:10.5555/ tional Conference on Computer Vision, IEEE, 2009, 1784607.1784643. pp. 460–467. doi:10.1109/ICCV.2009.5459199. [31] S. Rose, D. Engel, N. Cramer, W. Cowley, Automatic [21] P. Bojanowski, E. Grave, A. Joulin, T. lov, Enriching keyword extraction from individual documents, in: word vectors with subword information, Trans- Text Mining, John Wiley & Sons Ltd., 2010, pp. 1–20. actions of the Association for Computational Lin- doi:10.1002/9780470689646.ch1. guistics 5 (2017) 135–146. doi:10.1162/tacl_a_ [32] S. R. El-Beltagy, A. Rafea, Kp-miner: A keyphrase 00051. extraction system for english and arabic docu- [22] S. Bird, NLTK: The Natural Language Toolkit, in: ments, Information Systems 34 (2009) 132–144. Proceedings of the COLING/ACL 2006 Interactive doi:10.1016/j.is.2008.05.002. Presentation Sessions, ACL, 2006, pp. 69–72. doi:10. [33] R. Mihalcea, P. Tarau, TextRank: Bringing order 3115/1225403.1225421. into text, in: Proceedings of the 2004 Conference on [23] J. McAuley, C. Targett, Q. Shi, A. van den Hen- Empirical Methods in Natural Language Processing, gel, Image-based recommendations on styles and ACL, 2004, pp. 404–411. URL: https://aclanthology. substitutes, in: Proceedings of the 38th Interna- org/W04-3252. tional ACM SIGIR Conf. on R&D in Information [34] A. Bougouin, F. Boudin, B. Daille, TopicRank: Retrieval, ACM, 2015, pp. 43–52. doi:10.1145/ Graph-based topic ranking for keyphrase extrac- 2766462.2767755. tion, in: Proceedings of the 6th International [24] J. McAuley, J. Leskovec, Hidden factors and hid- Joint Conference on Natural Language Process- den topics: Understanding rating dimensions with ing, AFNLP, 2013, pp. 543–551. URL: https:// review text, in: Proceedings of the 7th ACM Con- aclanthology.org/I13-1062. ference on Recommender Systems, ACM, 2013, pp. [35] A. Salle, A. Villavicencio, Incorporating subword 165–172. doi:10.1145/2507157.2507163. information into matrix factorization word embed- dings, in: Proceedings of the 2nd Workshop on Subword/Character Level Models, ACL, 2018, pp. 66–71. doi:10.18653/v1/W18-1209. [36] T. Mikolov, K. Chen, G. Corrado, J. Dean, Effi- cient Estimation of Word Representations in Vector Space, in: Workshop Track Proceedings of the 1st International Conference on Learning Representa- tions, 2013. URL: https://arxiv.org/abs/1301.3781. [37] J. Pennington, R. Socher, C. D. Manning, GloVe: Global vectors for word representation, in: Pro- ceedings of the 2014 Conference on Empirical Meth- ods in Natural Language Processing, ACL, 2014, pp. 1532–1543. doi:10.3115/v1/D14-1162. [38] O. Levy, Y. Goldberg, Linguistic regularities in sparse and explicit word representations, in: Pro- ceedings of the 18th Conference on Computational Natural Language Learning, ACL, 2014, pp. 171–180. doi:10.3115/v1/W14-1618. [39] M. Nickel, D. Kiela, Poincaré embeddings for learning hierarchical representations, Advances in Neural Information Process- ing Systems 30 (2017) 6338–6347. URL: https://proceedings.neurips.cc/paper/2017/hash/ 59dfa2df42d9e3d41f5b02bfc32229dd-Abstract. html.