=Paper=
{{Paper
|id=Vol-2960/paper2
|storemode=property
|title=Reviews Are Gold!? On the Link between Item Reviews and Item Preferences (Long paper)
|pdfUrl=https://ceur-ws.org/Vol-2960/paper2.pdf
|volume=Vol-2960
|authors=Tobias Eichinger
|dblpUrl=https://dblp.org/rec/conf/recsys/Eichinger21
}}
==Reviews Are Gold!? On the Link between Item Reviews and Item Preferences (Long paper)==
<pdf width="1500px">https://ceur-ws.org/Vol-2960/paper2.pdf</pdf>
<pre>
Reviews Are Gold!? On the Link between Item Reviews
and Item Preferences
Tobias Eichinger1
1
    Technical University of Berlin, Straße des 17. Juni 135, Berlin, 10623, Germany


                                             Abstract
                                             User-user similarities in recommender systems are traditionally assessed on co-rated items. As ratings encode item prefer-
                                             ences, similarities on co-rated items capture similarities in item preferences. However, a majority of similarities are undefined
                                             as particularly small profiles seldom overlap. We propose to use a similarity measure based on users’ item reviews in order
                                             to estimate similarities in item preferences in the absence of co-rated items. Although it is commonly believed that item
                                             reviews are descriptive of a user’s item preferences, it is not clear whether indeed and what about a user’s item preferences
                                             item reviews describe. We present empirical results indicating that the proposed review-based similarity measure captures
                                             features in users’ item preferences that are different from those captured on co-rated items. Astonishingly, we find that 10
                                             keywords of a user’s item reviews suffice to represent a user’s item preferences. Independently, we argue that the proposed
                                             review-based similarity measure is particularly suitable for use in decentralized recommender systems for three design prop-
                                             erties. First, it can be calculated between any pair of users who hold item reviews. Second, it can be calculated bilaterally
                                             without involvement of a third party. And third, it does not require to reveal a user’s plain review text.

                                             Keywords
                                             review-based similarity, word mover’s distance, word embedding, fasttext, keyword extraction, YAKE,


1. Introduction                                                                                                       Exact similarity estimates can be obtained through cryp-
                                                                                                                      tographic methods bilaterally [7, 8], or with the help of a
We denote by scarcity the situation that only a small                                                                 third-party [9]. Despite the feasibility to compute similar-
subset of rows in the user-item matrix are available for                                                              ities in a privacy-preserving fashion, the problem remains
recommendation. Scarcity is commonly encountered in                                                                   that similarity measures usually require that some items
decentralized recommender systems in which users only                                                                 are rated by both users. Such items are typically denoted
have access to a small subset of other users. Scarcity is                                                             co-rated items. As user-item matrices are commonly very
often considered beneficial for user privacy [1, 2], yet                                                              sparse, similarity measures based on co-rated items are
detrimental to recommendation performance. Sharing                                                                    undefined for a majority of user-user pairs. This circum-
rating profiles in order to alleviate scarcity is problematic                                                         stance is exacerbated under scarsity.
as rating profiles are often considered personal data, and                                                               We propose to calculate similarities between users on
sensitive as such. A compromise is commonly made by                                                                   the basis of their item reviews instead of co-rated items.
only sharing ratings with similar users, where similarity                                                             We particularly address scenarios in which sparsity
is measured with respect to item preference. As similari-                                                             meets scarcity. Reviews are commonly believed to be
ties are traditionally calculated on users’ item ratings, it                                                          descriptive of a user’s item preferences. This belief is
is not trivial to find users with similar item preferences                                                            supported by reports on the success of state-of-the-art-
without sharing one’s item ratings.                                                                                   algorithms that leverage reviews [10, 11]. However,
   Approximate and exact methods have been proposed                                                                   a recent review by Sachdeva and McAuley [12] puts
to calculate the similarity between users on ratings with-                                                            this belief into question. Their findings indicate that
out revealing them. Lathia [3] proposes an approximate                                                                state-of-the-art algorithms that leverage item reviews do
similarity measure that does not require to disclose nei-                                                             not consistently outperform even simple baselines that
ther the rated items nor their actual ratings. Other ap-                                                              do not. This inconsistency raises the question whether
proximate methods include profile obfuscation [4, 5, 6].                                                              state-of-the-art methods are in fact able to reliably
                                                                                                                      extract information from reviews that is beneficial for
3rd Edition of Knowledge-aware and Conversational Recommender                                                         recommendation. Reviews remain complex inputs to
Systems (KaRS) & 5th Edition of Recommendation in Complex                                                             recommendation algorithms that seem to defy current
Environments (ComplexRec) Joint Workshop @ RecSys 2021,                                                               endeavors to extract users’ item preferences reliably.
September 27–1 October 2021, Amsterdam, Netherlands
" tobias.eichinger@tu-berlin.de (T. Eichinger)
~ https://www.snet.tu-berlin.de/menue/team/tobias_eichinger/                                                             In order to better understand what about users’ item
(T. Eichinger)                                                                                                        preferences is reflected in reviews, we propose a simi-
 0000-0002-8351-2823 (T. Eichinger)                                                                                  larity measure that compares users on the basis of their
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative
                                       Commons License Attribution 4.0 International (CC BY 4.0).                     item reviews. We report findings on our pilot experi-
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
                                                                          u                                         v

                1. Concatenate Reviews                        +      +…+        =                       +       +…+        =


                  2. Drop Stop Words


                                                                  “awesome” : 0.65                          “performance” : 0.8
         3. Extract Weighted Keywords (YAKE)                                                                “usability”   : 0.2
                                                                  “functional” : 0.35

                                                                                                       [0.11, 0.59,.., 0.01] : 0.8
           4. Map Keywords to Word Vectors                s: [0.51, -0.04,.., 0.91] : 0.6         t:
                                                             [1.12, 0.46,.., -0.85] : 0.4              [3.41, 0.66,.., -0.33] : 0.2


      5. Calculate Word Mover’s Distance (WMD)                                              WMD(s,t)


        6. Transform WMD to Similarity Measure                                       simYAKE & WMD(u,v)


Figure 1: Similarity comparison between users 𝑢 and 𝑣 on the basis of their item reviews. The comparison procedure follows
a six-step approach based on [13].


ments indicating that the proposed similarity measure         al. originally proposed keyword extraction on the basis
(a) indeed captures similarity in users’ item preferences,    of tf-idf features. In contrast to the original work, we
and (b) captures features that are different from those       instead apply a state-of-the-art keyword extractor, which
captured by co-rated-items-based similarity measures.         additionally allows users to run keyword extraction in-
   Independently from the above results, we find that the     dependently from other users.
design of the proposed review-based similarity measure
motivates its use in decentralized recommender systems        2.1. From Document Distance to Review
for three design properties. First, it can be calculated
between any pair of users who hold item reviews. Second,
                                                                   Similarity
it can be calculated bilaterally without involvement of    4. Map Keywords to Word Vectors (Figure 1): Kusner et
a third party. And third, it does not require to reveal a  al. [14] propose the Word Mover’s Distance (WMD), a
user’s plain review text.                                  distance metric between text documents that are each
                                                           represented by a subset of their words.1 The WMD is
                                                           made such that text documents that hold semantically
2. Concept                                                 similar words – and thus not necessarily the same
We follow along the lines of the user-user similarity mea- words    – are close. Semantic similarity between words
sure proposed by Eichinger et al. [13]. It has originally  is captured    by word embeddings. Word embeddings
been proposed as a general-purpose similarity measure      map     words to  word vectors such that word vectors of
on texting data. In the paper at hand, we instead apply    semantically    similar words are close. Words need not
it to item reviews and show that it particularly captures  necessarily   be keywords.  Note that all users need to use
similarity in users’ item preferences.                     the  same   word   embedding   model, wherefore we use a
    Similarity comparison can be summarized as a six-step publicly available pre-trained word embedding model.
approach as shown in Figure 1. We first elaborate on
Steps 4.-6. in Section 2.1, which constitute the core of 5. Calculate Word Mover’s Distance (Figure 1): The WMD
the similarity measure. Afterwards in Section 2.2, we leverages word vectors that condense semantic similarity
focus on optional steps such as text preprocessing and          1
                                                                  The WMD is more broadly known as Earth Mover’s Distance
keyword extraction comprising Steps 1.-3. Eichinger et
                                                              (EMD), where the EMD is in turn a special Wasserstein metric.
between single words, in order to measure semantic            WMD has not found wide adoption. Efforts to lower the
similarity between sets of words. More precisely, the         computational complexity include approximation [16, 17]
WMD compares so-called signatures.2 Signatures are            and the reduction of the signature size by keyword ex-
sets of word vectors in which every word vector is            traction [13]. In a previous paper, we applied keyword
associated with a word weight. The number of word             extraction on the basis of the tf-idf word relevance mea-
vectors in a signature is called the signature size. The      sure [13]. Note that keyword extraction via tf-idf requires
distance between two signatures, associated with              to keep track of the global usage of terms in all users’
the distance between two text documents, is then              reviews. A more convenient alternative is Yet Another
determined by solving a transportation problem (see           Keyword Extractor (YAKE) by Campos et al. [18, 19].
[14] for details). The WMD can be calculated bilaterally      Their keyword extractor is document-based and works
and independently of other users upon the exchange of         on textual features of single documents. It does not re-
signatures.                                                   quire information on other documents.
                                                                 YAKE is a weighted keyword extractor.5 It attaches
6. Transform WMD to Similarity Measure (Figure 1): We positive keyword weights 𝑔𝑖 > 0 to every keyword 𝑤𝑖 of
transform the WMD distance metric into a similarity a text document. Keywords in YAKE are considered more
measure. Note that the WMD distance between two simi- important in describing their underlying text document
lar text documents is close to zero, whereas dissimilar text the smaller their associated keyword weights are.
documents may yield arbitrarily large WMD distances. Conversely, WMD word weights are considered more
Hence, we first limit the co-domain to WMD(𝑠, 𝑡) ∈ important the larger they are. We therefore reverse the
[0, 2] for any pair of signatures 𝑠 and 𝑡. We do so by order of the keyword weights 𝑔𝑖 for use as word weights
using the cosine distance3 to measure distances between in the WMD. We do so via the linear transformation
word vectors and normalize word weights in a signature defined by 𝑔𝑖 := 𝑔𝑚𝑎𝑥 + 𝑔𝑚𝑖𝑛 − 𝑔𝑖 ∈ [𝑔𝑚𝑖𝑛 , 𝑔𝑚𝑎𝑥 ], and
such that they sum to 1.4 We then obtain a similarity consecutive normalization of the word weights such that
measure upon the following linear transformation:             they sum to 1, where 𝑔𝑚𝑖𝑛 and 𝑔𝑚𝑎𝑥 are the minimum
                                                              and maximum keyword weights respectively.
                              1
      𝑠𝑖𝑚WMD (𝑠, 𝑡) := 1 − WMD(𝑠, 𝑡) ∈ [0, 1].
                              2                                  Applying YAKE in conjunction with the above weight
The signature size is the sole hyperparameter of the          transformation  on a user’s item reviews yields signatures
WMD, and thus also of the associated similarity mea- that serve as input to the WMD. We denote by YAKE(𝑢)
sure 𝑠𝑖𝑚WMD . We will specify the signature size where the thus associated signature of some user 𝑢’s item re-
required, yet omit it in the notation for reasons of brevity. views. Combining this with the results of Section 2.1, we
                                                              can now define the review-based similarity measure as
2.2. Key Word Extraction                                                     𝑠𝑖𝑚review (𝑢, 𝑣) := 𝑠𝑖𝑚WMD (YAKE(𝑢), YAKE(𝑣)),
1. Concatenate Reviews (Figure 1): In order to arrive at
                                                                          where 𝑢 and 𝑣 are some users that hold item reviews.
a user-specific text document, we first concatenate all
item reviews authored by a user in arbitrary order with
blanks between reviews.                                                   3. Evaluation
2. Drop Stop Words (Figure 1): We drop stop words as a             We present results that contrast the cosine similarity as
basic text preprocessing step. We do not perform any               a traditional co-rated-items-based similarity measure
further preprocessing in order to mitigate the impact              with the proposed review-based similarity measure
due to preprocessing on the evaluation of the proposed             𝑠𝑖𝑚review . We emphasize that the goal we pursue by
review-based similarity measure.                                   this comparison is not to argue that the review-based
                                                                   similarity measure is superior to co-rated-items-based
3. Extract Weighted Keywords (Figure 1): The computa- similarity measures. Instead, it is our goal to find a solid
tional complexity of the WMD is often prohibitive as indication that the review-based similarity measure
it is supercubic in the signature size. For this reason, the indeed captures similarity in terms of item preferences
                                                                   between users. We employ the rationale, that if the
     2
       The term signature has been coined by Rubner et al. [15] in review-based similarity measure captures similarity
the domain of computer vision as abstractions of color histograms.
     3                    <𝑢,𝑣>
       𝑑cos (𝑢, 𝑣) = 1 − ‖𝑢‖·‖𝑣‖ , where < · , · > denotes the dot             5
                                                                                 YAKE also extracts keyphrases. However, we only consider
product and ‖·‖ the Euclidean norm.                                       keywords and omit treatment of keyphrases for reasons of simplic-
    4
      If the Euclidean distance is preferred, we can alternatively nor-   ity. Although it is also possible to convert keyphrases into vectors,
malize vectors to length 1 and normalize word weights such that           distinct scientific reasoning is required to justify a comparison be-
they sum to 1.                                                            tween signatures that combine word vectors and phrase vectors.
Table 1
Descriptive Statistics of the data sets High-100, Median-100, Low-100, and Mix-100. *: Signature size between 1 and 100 that
produces the best RMSE on user-based CF as per Equation (1). Tie-breakers are resolved by choice of the smallest signature
size. **: Percentage of user pairs that have co-rated items in the training set.
                                                            Head-100     Median-100     Tail-100         Mix-100
                       unique users                              100             100         100             100
                       unique items                         180, 981             797         499         70, 213
                       ratings/reviews                      278, 927             800         500         83, 714
                       optimal signature size*                     5               –           –              10
                       pairs with co-rated items**            81.82%           0.03%       0.01%          8.77%


between users’ item preferences, then it necessarily must     sample. Some descriptive statistics are shown in Table
perform well in user-based Collaborative Filtering (CF).      1. Note that, if similarity is measured on the basis of
                                                              co-rated items, only 0.03% and 0.01% of all pairwise
   We calculate 𝑠𝑖𝑚review with the help of the follow- similarites can be calculated for users in the Median-
ing software contributions.6 We use Pele and Werman’s 100 and Low-100 data sets respectively. As we compare
Python implementation of the WMD [20].7 We use Bo- review-based with co-rated-items-based similarity mea-
janowski et al.’s publicly available pre-trained fasttext sures, we omit an analysis on the samples Median-100
word embedding model cc.en.300.bin [21].8 We use the and Low-100 as they simply provide too little ground for
stop word list provided by Bird’s The Natural Language comparison.
Toolkit (NLTK) [22]. Finally, we use Campos et al.’s
Python library yake [18].9                                    3.2. Baselines
   Splits into training and test sets are at a ratio of 80 to
20, where particularly every user’s entries are split into We apply the following standard mean-centered rating
portions of training and test entries. We report average estimation equation for user-based CF:
results over 5 distinct training-test splits on the usual                          ∑︁ 𝑠𝑖𝑚(𝑢, 𝑣)(𝑟𝑣,𝑖 − ¯
Root Mean Squared Error (RMSE) accuracy metric.                                                            𝑟𝑣 )
                                                                    ˆ
                                                                    𝑟𝑢,𝑖 = ¯
                                                                           𝑟𝑢 +            ∑︀                   , (1)
                                                                                  𝑣∈𝑁         𝑣∈𝑁𝑢,𝑖 𝑠𝑖𝑚(𝑢, 𝑣)
                                                                                                   𝑢,𝑖

3.1. Data Sets
                                                                       where ˆ 𝑟𝑢,𝑖 denotes an estimated rating for user 𝑢
We present results on two small samples of the Amazon                  on item 𝑖, ¯𝑟𝑢 the mean rating of user 𝑢, 𝑁𝑢,𝑖 some
Reviews 5-core (2014) data set [23].10 The original data set           neighborhood of users of user 𝑢 that have rated item 𝑖,
holds roughly 41 million entries on 24 product domains,                𝑠𝑖𝑚 a user-user similarity measure, and 𝑟𝑣,𝑖 the true
where every user has at least 5 rating-review pairs. The               rating of user 𝑣 on item 𝑖. For reasons of brevity we say
two samples considered in the paper at hand cover two                  that a similarity measure outperforms another, when in
distinct scenarios of (a) an artificially high and (b) a more          fact we mean that rating estimation as per Equation (1)
realistic density. We now describe their construction.                 equipped with the one similarity measure outperforms
   We draw the first data set Head-100 by selecting the                that equipped with the other.
100 largest user profiles. It simulates an artificially high
density of ratings and reviews with particularly large                    We propose two similarity measures as baselines for
amounts of review text per user. As for the second data                comparison with 𝑠𝑖𝑚review . First, cosine similarity as a
set Mix-100, we draw 2 additional data sets Median-100                 similarity measure based on mutually rated items. And
and Tail-100, of medium and low density, by selecting                  second, a simple arithmetic mean having equal similarity
the profiles of 100 median and 100 tail users respectively.            weights 𝑠𝑖𝑚mean (𝑢, 𝑣) = 1/|𝑁𝑢,𝑖 | for all users 𝑣 ∈ 𝑁𝑢,𝑖 .
We finally sample Mix-100 from the datasets Head-100,                  If 𝑠𝑖𝑚review outperforms 𝑠𝑖𝑚mean , it is an indication that
Median-100, and Tail-100 at a ratio of 33 to 34 to 33. We              𝑠𝑖𝑚review does capture similarity in item preference be-
construct Mix-100 in this way in order to guarantee the                tween users, that is more than an estimate without prior
presence of large, medium-sized, and small profiles in the             knowledge on reviews. If further 𝑠𝑖𝑚review outperforms
                                                                       𝑠𝑖𝑚cosine , it is an indication that the review-based similar-
    6
      https://github.com/TEichinger/WMDtestbed                         ity measure captures similarity in item preference at least
    7
      https://pypi.org/project/pyemd/                                  on a par with co-rated-items-based similarity measures.
    8
      https://fasttext.cc/docs/en/pretrained-vectors.html
    9
      https://pypi.org/project/yake/
   10
      https://doi.org/10.7910/DVN/V7X3VE
                                Head-100                                               Mix-100
            600
                                                                                                                                     review
                                                                                                                                     cosine
            500

            400
frequency


            300

            200

            100

             0
                  0.0   0.2   0.4        0.6   0.8   1.0    0.0         0.2      0.4             0.6             0.8       1.0
                                similarity                                         similarity


Figure 2: Comparison of histograms of pairwise similarities for the review-based similarity measure 𝑠𝑖𝑚review and the cosine
similarity 𝑠𝑖𝑚cosine for the training sets Head-100 and Mix-100.


3.3. Capture Similarity in Item                                   both High-100 and Mix-100. In contrast, 𝑠𝑖𝑚review ’s dis-
     Preference on Item Reviews                                   tribution follows the shape of a normal distribution in
                                                                  High-100, and a mix of normal distributions with distinct
We present findings that indicate that the review-based           modes in Mix-100. Review-based similarities in High-100,
similarity measure 𝑠𝑖𝑚review captures similarity in item          Median-100, and Low-100 seem to have distinct modes
preference. We first constrast the statistical properties of      that interfere in Mix-100.
𝑠𝑖𝑚review and 𝑠𝑖𝑚cosine , where we assume that 𝑠𝑖𝑚cosine
already captures some similarity in item preference. We           Table 2
then measure their respective impact on rating estimation         Recommendation accuracy (RMSE) over various minimum
as per Equation (1), acting in the role of (a) similarity         neighborhood sizes 𝑛min = min|𝑁𝑢,𝑖 | on the Head-100 and
weights, and (b) a neighborhood selection criterion. In                                                    𝑢,𝑖
                                                                  Mix-100 data sets. Bold values are per-column best values,
order to study the impact due to (a) and (b) individually,
                                                                  where asterisks indicate statistical significance of paired 𝑡-
we first omit neighborhood selection by setting 𝑁𝑢,𝑖 as           tests to the alternatives (𝛼 = 0.05). Hyphens indicate that no
the set of all other users 𝑣 ̸= 𝑢 and applying 𝑠𝑖𝑚review          ratings could be estimated due to insufficiently small neigh-
similarity weights, and then conversely omit similarity-          borhood size. We see that weighing ratings via 𝑠𝑖𝑚review
based weighted averaging by using 𝑠𝑖𝑚mean similarity              in Equation (1) consistently outperforms the baselines for
weights and setting 𝑁𝑢,𝑖 as the set of 𝑘 users that are           𝑛min ≤ 5 on average.
most similar to user 𝑢 with respect to 𝑠𝑖𝑚review and have          Head-100
rated item 𝑖.
                                                                   𝑛min            1                   2               3         5    10

3.3.1. Statistical Properties                                      𝑠𝑖𝑚review    0.962        0.911               0.899       0.903   0.880
                                                                   𝑠𝑖𝑚cosine    0.968        0.923               0.905       0.904   0.877
The similarity measures 𝑠𝑖𝑚review and 𝑠𝑖𝑚cosine capture            𝑠𝑖𝑚mean      0.962        0.912               0.900       0.903   0.880
distinct aspects of similarity between users’ item pref-
erences. We find that 𝑠𝑖𝑚review and 𝑠𝑖𝑚cosine are only             Mix-100
weakly positively correlated with respect to the Spear-            𝑛min            1                   2               3         5    10
man rank correlation (0.36 on High-100, and 0.25 on
                                                                   𝑠𝑖𝑚review    1.084        1.019*              0.985*      0.905    –
Mix-100). If in contrast both similarity measures were
                                                                   𝑠𝑖𝑚cosine    1.087        1.033               0.997       0.932    –
strongly positively correlated, this would indicate that           𝑠𝑖𝑚mean      1.085        1.022               0.988       0.988    –
both similarity measures capture similar aspects of simi-
larity. In that case, we would also expect that both similar-
ity measures yield similar recommendation performance.
   We further find that 𝑠𝑖𝑚review and 𝑠𝑖𝑚cosine produce
                                                                  3.3.2. Similarity-based Weighted Averaging
distinct similarity distributions. Figure 2 shows the sim-
ilarity distributions in the data sets High-100 and Mix-          We find that 𝑠𝑖𝑚review similarity weights provide signif-
100. We observe that the cosine similarity’s distribu-            icantly better recommendation performance, if profiles
tion follows the shape of an exponential distribution in          are not exclusively large such as in Mix-100. If in con-
                      Head-100 (nmin = 10)                                 Mix-100 (nmin = 5)
       1.2                                                                                                        mean
                                                                                                                  review
       1.0                                                                                                        cosine

       0.8
RMSE


       0.6
       0.4
       0.2
       0.0
             10 20 30 40 50 60 70 80 90 100                10 20 30 40 50 60 70 80 90 100
                      k most similar profiles                            k most similar profiles

Figure 3: Recommendation accuracy (RMSE) on the 𝑘 most similar profiles on the Head-100 and Mix-100 data sets on 𝑠𝑖𝑚mean
similarity weights. Error bars show ± one standard deviation. For 𝑘 = 10 in Head-100, no ratings could be estimated due to
a lack of ratings.


trast, profiles are exclusively large such as in Head-100,       Item Reviews for Item Recommendation: There
𝑠𝑖𝑚review similarity weights, 𝑠𝑖𝑚cosine similarity weights,   is a wealth of work that aims to leverage item re-
and 𝑠𝑖𝑚mean similarity weights perform similarly as           views in order to improve recommendation performance.
shown in Table 2. None performs significantly better on       Sachdeva and McAuley [12] recently presented a review
Head-100. On Mix-100 however, 𝑠𝑖𝑚review significantly         of state-of-the-art recommender algorithms that leverage
outperforms the alternatives for 𝑛min ∈ {2, 3}. For           review data. They categorize them into two tracks. First,
𝑛min = 5, superiority is not statistically significant de-    algorithms that use reviews for regularization at algo-
spite the large absolute margin due to 𝑠𝑖𝑚review ’s high      rithm training time [24, 25]. And second, algorithms that
empirical standard deviation.                                 use review-based features for use at recommendation
                                                              time [26, 10, 11, 25, 27, 28, 29]. In the paper at hand, we
3.3.3. Similarity-based Profile Selection                     propose a review-based similarity measure as a feature
                                                              that captures similarity in users’ item preferences. We
We find that performing rating estimation on only the 𝑘       thus contribute to the second category.
most similar user profiles based on 𝑠𝑖𝑚review outperforms        Estimating Similarity in Item Preference with-
both 𝑠𝑖𝑚cosine and 𝑠𝑖𝑚mean on average. More concretely,       out Using Ratings: Similarity in item preference can for
we see in Figure 3 that 𝑠𝑖𝑚review and 𝑠𝑖𝑚cosine perform       instance be estimated on the basis of the shared context
similarly for 𝑘 ≥ 40 on both Head-100 and Mix-100. For        of users. Wainakh et al. [2] show that users who share a
𝑘 ≤ 30, we see that decreasing 𝑘 simulatenously yield de-     social context also tend to share item preferences. More
creasing RMSE values on Head-100. On Mix-100, only the        precisely, they show that profiles sampled from users
RMSE of 𝑠𝑖𝑚review decreases for decreasing 𝑘, while the       close in the social graph provide better recommendation
RMSE of 𝑠𝑖𝑚cosine essentially stays the same. This is due     accuracy on an association rules mining algorithm as
to the fact that many pairwise 𝑠𝑖𝑚cosine values are unde-     compared to uniformly randomly sampled profiles. de
fined such that increasing 𝑘 does not yield larger neigh-     Spindler et al. [30] propose to use geo-temporal context
borhoods 𝑁𝑢,𝑖 . We observe further that RMSE mean             between users as a proxy to elicit mutual item preferences
values tend to decrease with decreasing parameter val-        in opportunistic networking scenarios.
ues 𝑘, whereas RMSE standard deviations increases with           Alternative Keyword Extractors and Word Em-
decreasing parameter values 𝑘.                                beddings: The literature proposes a large spectrum of
                                                              keyword extractors and word embedding models. We ap-
4. Related Work                                               ply YAKE as a state-of-the-art keyword extractor [18, 19].
                                                              It runs on single documents rather than a corpus of doc-
We find related work on the following three aspects. First,   uments. Keyword extraction can thus be performed by
leveraging review text for recommendation in general.         users individually. An alternative that also runs on single
Second, estimating similarity in item preference without      documents is RAKE [31]. A majority of keyword extrac-
using ratings. And third, alternatives to the proposed        tors require a document corpus for keyword extraction
YAKE keyword extractor and fasttext word embedding            [32, 33, 34].
models for use in 𝑠𝑖𝑚review .                                    We apply a fasttext word embedding model since it can
map word tokens that have not been seen at training time           measures, in: Proceedings of the 1st ACM Con-
by leveraging subword information [21]. An alternative             ference on Recommender Systems, ACM, 2007, pp.
that also leverages subword information is for instance            1–8. doi:10.1145/1297231.1297233.
LexVec [35]. A majority of word embedding models does          [4] S. Berkovsky, T. Kuflik, F. Ricci, The impact of data
not leverage subword information and can thus only map             obfuscation on the accuracy of collaborative filter-
word tokens available at training time [36, 37, 38, 39].           ing, Expert Systems with Applications 39 (2012)
                                                                   5033–5042. doi:10.1016/j.eswa.2011.11.037.
                                                               [5] R. Parameswaran, D. M. Blough, Privacy preserving
5. Conclusion                                                      collaborative filtering using data obfuscation, in:
                                                                   Proceedings of the 3rd IEEE International Confer-
We propose a review-based user-user similarity measure
                                                                   ence on Granular Computing, 2007, pp. 380–380.
that presents an alternative to traditional co-rated-items-
                                                                   doi:10.1109/GrC.2007.133.
based similarity measures. It is particularly suitable, if
                                                               [6] H. Polat, W. Du, Privacy-preserving collabora-
two users do not have co-rated items and would thus
                                                                   tive filtering using randomized perturbation tech-
default to an undefined user-user similarity. Similarities
                                                                   niques, in: Proceedings of the 3rd IEEE Interna-
can now be calculated on the basis of item reviews instead
                                                                   tional Conference on Data Mining, 2003, pp. 625–
of co-rated items.
                                                                   628. doi:10.1109/ICDM.2003.1250993.
   We find that the proposed review-based similarity mea-
                                                               [7] M. Alaggan, S. Gambs, A.-M. Kermarrec, Pri-
sure captures similarity in users’ item preferences. Inter-
                                                                   vate similarity computation in distributed sys-
estingly, the proposed review-based similarity measure
                                                                   tems: from cryptography to differential privacy,
captures different features from those captured on co-
                                                                   in: OPODIS, volume 7109 of LNCS, Springer
rated items. The difference can be linked implicitly to the
                                                                   Berlin Heidelberg, 2011, pp. 357–377. doi:10.1007/
difference in their statistical features such as similarity
                                                                   978-3-642-25873-2_25.
distribution and Spearman rank correlation. However,
                                                               [8] D. Yang, C. Lin, B. Yang, A novel secure cosine
the difference cannot be characterized explicitly as the
                                                                   similarity computation scheme with malicious ad-
review-based similarity measure is based on unsuper-
                                                                   versaries, International Journal of Network Security
vised word embeddings. More precisely, word embed-
                                                                   & Its Applications 5 (2013) 171–178. doi:10.5281/
dings find semantic similarity between words, yet do not
                                                                   zenodo.4032143.
tell how and in which sense the words are similar.
                                                               [9] J. Zhang, S. Hu, Z. L. Jiang, Privacy-preserving
   We conclude that the proposed review-based user-user
                                                                   similarity computation in cloud-based mobile social
similarity measure presents a promising feature for rec-
                                                                   networks, IEEE Access 8 (2020) 111889–111898.
ommender system design, when item reviews are avail-
                                                                   doi:10.1109/ACCESS.2020.3003373.
able. We do not argue that the proposed review-based
                                                              [10] F. Chen, Z. Dong, Z. Li, X. He, Federated meta-
similarity measure is in any sense superior to co-rated-
                                                                   learning for recommendation, 2018. URL: http://
items-based similarity measures. On the contrary, our
                                                                   arxiv.org/abs/1802.07876.
findings indicate that they are complementary in model-
                                                              [11] Y. Tay, A. T. Luu, S. C. Hui, Multi-pointer co-
ing users’ item preferences.
                                                                   attention networks for recommendation, in: Pro-
                                                                   ceedings of the 24th ACM SIGKDD International
References                                                         Conference on Knowledge Discovery & Data Min-
                                                                   ing, ACM, 2018, pp. 2309–2318. doi:10.1145/
 [1] M. Larson, A. Zito, B. Loni, P. Cremonesi, To-                3219819.3220086.
     wards minimal necessary data: The case for ana-          [12] N. Sachdeva, J. McAuley, How useful are reviews
     lyzing training data requirements of recommender              for recommendation? a critical review and poten-
     algorithms, in: Proceedings of the 1st FATREC                 tial improvements, in: Proceedings of the 43rd
     Workshop @ RecSys, 2017, pp. 1–6. doi:10.18122/               International ACM SIGIR Conf. on R&D in In-
     B2VX12.                                                       formation Retrieval, ACM, 2020, pp. 1845–1848.
 [2] A. Wainakh, T. Grube, J. Daubert, M. Mühlhäuser,              doi:10.1145/3397271.3401281.
     Efficient     privacy-preserving      recommenda-        [13] T. Eichinger, F. Beierle, S. U. Khan, R. Middela-
     tions based on social graphs,          in: Proceed-           nis, V. Sekar, S. Tabibzadeh, affinity: A system
     ings of the 13th ACM Conference on Rec-                       for latent user similarity comparison on texting
     ommender Systems, ACM, 2019, pp. 78–86.                       data, in: Proceedings of the 53rd IEEE Int. Conf.
     doi:10.1145/3298689.3347013.                                  on Comm., IEEE, 2019, pp. 1–7. doi:10.1109/ICC.
 [3] N. Lathia, S. Hailes, L. Capra, Private distributed           2019.8761051.
     collaborative filtering using estimated concordance      [14] M. Kusner, Y. Sun, N. Kolkin, K. Weinberger, From
                                                                   word embeddings to document distances, in: Pro-
     ceedings of the 32nd Int. Conf. on Machine Learn-        [25] R. Catherine, W. Cohen, Transnets: Learning to
     ing, volume 37, JMLR.org, 2015, pp. 957–966. doi:10.          transform for recommendation, in: Proceedings
     5555/3045118.3045221.                                         of the 11th ACM Conference on Recommender
[15] Y. Rubner, C. Tomasi, L. Guibas, A metric for dis-            Systems, ACM, 2017, pp. 288 – 296. doi:10.1145/
     tributions with applications to image databases, in:          3109859.3109878.
     Proceedings of the 6th IEEE International Confer-        [26] L. Zheng, V. Noroozi, P. S. Yu, Joint deep modeling
     ence on Computer Vision, IEEE, 1998, pp. 59–66.               of users and items using reviews for recommen-
     doi:10.1109/ICCV.1998.710701.                                 dation, in: Proceedings of the 10th ACM Interna-
[16] M. Cuturi, Sinkhorn distances: Lightspeed compu-              tional Conference on Web Search and Data Mining,
     tation of optimal transport, Advances in Neural               ACM, 2017, pp. 425–434. doi:10.1145/3018661.
     Information Processing Systems 26 (2013) 2292–                3018665.
     2300. URL: https://papers.nips.cc/paper/2013/hash/       [27] Y. Bao, H. Fang, J. Zhang, Topicmf: Simultaneously
     af21d0c97db2e27e13572cbf59eb343d-Abstract.                    exploiting ratings and reviews for recommendation,
     html.                                                         in: Proceedings of the 28th AAAI Conference on
[17] K. Atasu, T. Mittelholzer, Linear-complexity data-            Artificial Intelligence, AAAI Press, 2014, pp. 2–8.
     parallel earth mover’s distance approximations,               doi:10.5555/2893873.2893874.
     in: Proceedings of the 36th International Confer-        [28] K. Bauman, A. Tuzhilin, Discovering contex-
     ence on Machine Learning, volume 97, PMLR, 2019,              tual information from user reviews for recom-
     pp. 364–373. URL: http://proceedings.mlr.press/v97/           mendation purposes., in: Proceedings of the
     atasu19a.html.                                                1st CBRecSys Workshop @RecSys, CEUR-WS,
[18] R. Campos, V. Mangaravite, A. Pasquali, A. Jorge,             2014, pp. 2–9. URL: http://ceur-ws.org/Vol-1245/
     C. Nunes, A. Jatowt, Yake! keyword extraction                 cbrecsys2014-paper01.pdf.
     from single documents using multiple local features,     [29] P. G. Campos, N. Rodríguez-Artigot, I. Cantador, Ex-
     Information Sciences 509 (2020) 257–289. doi:10.              tracting context data from user reviews for recom-
     1016/j.ins.2019.09.013.                                       mendation: A linked data approach, in: In Proceed-
[19] R. Campos, V. Mangaravite, A. Pasquali, A. Jorge,             ings of the 1st ComplexRec Workshop @RecSys,
     C. Nunes, A. Jatowt,           Yake!       collection-        CEUR-WS, 2017, pp. 14–18. URL: http://ceur-ws.
     independent automatic keyword extractor, in: Ad-              org/Vol-1892/paper3.pdf.
     vances in Information Retrieval, volume 10772 of         [30] A. De Spindler, M. C. Norrie, M. Grossniklaus, Col-
     LNCS, Springer, 2018, pp. 806–810. doi:10.1007/               laborative filtering based on opportunistic informa-
     978-3-319-76941-7_80.                                         tion sharing in mobile ad-hoc networks, in: Pro-
[20] O. Pele, M. Werman, Fast and robust earth mover’s             ceedings of the 2007 OTM Conf. Int. Conf., Springer
     distances, in: Proceedings of the 12th IEEE Interna-          Berlin Heidelberg, 2007, pp. 408–416. doi:10.5555/
     tional Conference on Computer Vision, IEEE, 2009,             1784607.1784643.
     pp. 460–467. doi:10.1109/ICCV.2009.5459199.              [31] S. Rose, D. Engel, N. Cramer, W. Cowley, Automatic
[21] P. Bojanowski, E. Grave, A. Joulin, T. lov, Enriching         keyword extraction from individual documents, in:
     word vectors with subword information, Trans-                 Text Mining, John Wiley & Sons Ltd., 2010, pp. 1–20.
     actions of the Association for Computational Lin-             doi:10.1002/9780470689646.ch1.
     guistics 5 (2017) 135–146. doi:10.1162/tacl_a_           [32] S. R. El-Beltagy, A. Rafea, Kp-miner: A keyphrase
     00051.                                                        extraction system for english and arabic docu-
[22] S. Bird, NLTK: The Natural Language Toolkit, in:              ments, Information Systems 34 (2009) 132–144.
     Proceedings of the COLING/ACL 2006 Interactive                doi:10.1016/j.is.2008.05.002.
     Presentation Sessions, ACL, 2006, pp. 69–72. doi:10.     [33] R. Mihalcea, P. Tarau, TextRank: Bringing order
     3115/1225403.1225421.                                         into text, in: Proceedings of the 2004 Conference on
[23] J. McAuley, C. Targett, Q. Shi, A. van den Hen-               Empirical Methods in Natural Language Processing,
     gel, Image-based recommendations on styles and                ACL, 2004, pp. 404–411. URL: https://aclanthology.
     substitutes, in: Proceedings of the 38th Interna-             org/W04-3252.
     tional ACM SIGIR Conf. on R&D in Information             [34] A. Bougouin, F. Boudin, B. Daille, TopicRank:
     Retrieval, ACM, 2015, pp. 43–52. doi:10.1145/                 Graph-based topic ranking for keyphrase extrac-
     2766462.2767755.                                              tion, in: Proceedings of the 6th International
[24] J. McAuley, J. Leskovec, Hidden factors and hid-              Joint Conference on Natural Language Process-
     den topics: Understanding rating dimensions with              ing, AFNLP, 2013, pp. 543–551. URL: https://
     review text, in: Proceedings of the 7th ACM Con-              aclanthology.org/I13-1062.
     ference on Recommender Systems, ACM, 2013, pp.           [35] A. Salle, A. Villavicencio, Incorporating subword
     165–172. doi:10.1145/2507157.2507163.                         information into matrix factorization word embed-
     dings, in: Proceedings of the 2nd Workshop on
     Subword/Character Level Models, ACL, 2018, pp.
     66–71. doi:10.18653/v1/W18-1209.
[36] T. Mikolov, K. Chen, G. Corrado, J. Dean, Effi-
     cient Estimation of Word Representations in Vector
     Space, in: Workshop Track Proceedings of the 1st
     International Conference on Learning Representa-
     tions, 2013. URL: https://arxiv.org/abs/1301.3781.
[37] J. Pennington, R. Socher, C. D. Manning, GloVe:
     Global vectors for word representation, in: Pro-
     ceedings of the 2014 Conference on Empirical Meth-
     ods in Natural Language Processing, ACL, 2014, pp.
     1532–1543. doi:10.3115/v1/D14-1162.
[38] O. Levy, Y. Goldberg, Linguistic regularities in
     sparse and explicit word representations, in: Pro-
     ceedings of the 18th Conference on Computational
     Natural Language Learning, ACL, 2014, pp. 171–180.
     doi:10.3115/v1/W14-1618.
[39] M. Nickel, D. Kiela,         Poincaré embeddings
     for learning hierarchical representations,
     Advances in Neural Information Process-
     ing Systems 30 (2017) 6338–6347. URL:
     https://proceedings.neurips.cc/paper/2017/hash/
     59dfa2df42d9e3d41f5b02bfc32229dd-Abstract.
     html.

</pre>