A Cross-Cultural Analysis of Explanations
                                     for Product Reviews

                         John O’Donovan                                      Shinsuke Nakajima                           Tobias Höllerer
                Dept. of Computer Science                               Faculty of Computer Science                Dept. of Computer Science
               University of California, Santa                                and Engineering                     University of California, Santa
                    Barbara, CA, USA                                      Kyoto Sangyo University,                     Barbara, CA, USA
                             jod@cs.ucsb.edu                                    Kyoto, Japan                               holl@cs.ucsb.edu
                                                                             nakajima@cse.kyoto-su.ac.jp

                          Mayumi Ueda                                           Yuuki Matsunami                         Byungkyu Kang
                    Faculty of Economics
                     3
                                                                        Faculty of Computer Science                Dept. of Computer Science
                University of Marketing and                                   and Engineering,                    University of California, Santa
                Distribution Sciences, Kobe,                              Kyoto Sangyo University,                     Barbara, CA, USA
                            Japan                                            g1245108@cc.kyoto-su.ac.jp                   bkang@cs.ucsb.edu
                  Mayumi Ueda@red.umds.ac.jp


ABSTRACT                                                                                        Keywords
Cosmetic products are inherently personal. Many people                                          User Experience, Explanation, Decision Making, User-Centric
rely on product reviews when choosing to purchase cosmet-                                       Evaluation
ics. However, reviewers can have tastes that vary based on
personal, demographic or cultural background. Prior work                                        1   Introduction
has discussed methods for generating attribute-based expla-                                     Over the last 25 years, recommender systems have attempted
nations for item ratings on cosmetic products, based on as-                                     to help users find the right information at the right time [15].
sociated text-based reviews. This paper focuses on evalu-                                       More recently, the proliferation of e-commerce applications
ating explanation interfaces for product reviews and related                                    supports buying and selling products in the global market
attributes. We present the results of a cross-cultural user                                     with relatively little e↵ort. Increasingly, consumers are rely-
study that evaluates five associated explanation interfaces                                     ing on customer reviews to inform purchasing decisions [8].
for cosmetic product reviews across groups of participants                                      In many cases, product reviews are presented in summary
from three di↵erent cultural backgrounds. We applied a 3                                        form via mechanisms such as star ratings. Such represen-
by 2 within subjects experimental design in a user study                                        tations, however, typically fail to capture the subtle opin-
(N=150) to evaluate e↵ects of UI design and personaliza-                                        ions that exist in the accompanying text-based reviews. In
tion on a range of user experience metrics in a cosmetics                                       this paper, we build on recent work that automatically ex-
shopping scenario. Results of the study show that 1) Ko-                                        tracts attributes and associated ratings from online product
rean and Japanese speakers chose the most complex UI more                                       reviews [10]. In particular, we focus on understanding how
often than English speakers. 2) older participants also pre-                                    visual representations of various types of extracted item rat-
ferred more options in cosmetic product selection, regardless                                   ings impact user experience and conversion likelihoods in an
of cultural background. 3) personalization of product rat-                                      e-commerce setting, as exemplified in Figure 1. Motivated
ings did not show an e↵ect on user experience. 4) Attribute-                                    by recent research that shows the importance of user ex-
based explanations were preferred over star-ratings for all                                     perience over traditional accuracy metrics in recommender
three cultures. 5) Rating propensity evaluation showed that                                     systems [7], we conduct a user experiment to understand
Japanese provided significantly higher ratings than Korean                                      how rating display a↵ects user experience. Specifically, we
or English participants, and that Females provided higher                                       applied a 3 by 2 within subjects design (Table 1) in an online
ratings than Males, regardless of background.                                                   study (N=150) to evaluate e↵ects of UI design and personal-
                                                                                                ization on a user experience metrics in a cosmetics shopping
CCS Concepts                                                                                    scenario, considering the following research questions:
•Human-centered computing ! HCI design and evalu-                                               R1: Do cross-cultural preference di↵erences exist for recom-
ation methods; User models; User studies;                                                       mendation interfaces? If so, what are the key predictors of
                                                                                                these di↵erences?

Permission to make digital or hard copies of all or part of this work for personal or
                                                                                                R2: Are there cross-cultural preference di↵erences for per-
classroom use is granted without fee provided that copies are not made or distributed           sonalized v/s non-personalized recommender system inter-
for profit or commercial advantage and that copies bear this notice and the full citation       faces?
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or          R3: Are there cross-cultural preference di↵erences between
republish, to post on servers or to redistribute to lists, requires prior specific permission   traditional (star-rating) and more granular attribute-based
and/or a fee. Request permissions from permissions@acm.org.
                                                                                                recommender system interfaces?
IntRS 2016, September 16, 2016, Boston, MA, USA.
Copyright remains with the authors and/or original copyright holders, 2016.                     R4: Are there di↵erences in rating propensities across the
three cultures? If so, what are the strongest predictors of      backgrounds. In contrast to their study, which compared
observed rating shifts?                                          a novel UI against a list view and assessed user experience
  The cosmetics domain was used for this study, since they       metrics, we focus on the perception of attribute ratings ver-
are sold globally and are inherently personal in nature. To      sus traditional less-fine grained ratings, and on the impact
explore variances in opinions on the explanation interfaces      of personalization on these perceptions. A second contrast
across di↵erent cultural backgrounds, participant groups were    to [1] is that our work explores rating propensity across the
sourced from American, Japanese and Korean cultural back-        di↵erent groups.
grounds. These particular groups were selected as a repre-
sentative sample with diverse cultures, and because they are     3   Mining Attribute Ratings
among the fastest growing markets for cosmetics.1                This study builds on a recent work [10] on attribute ex-
                                                                 traction from online product reviews. Specifically, we posit
2 Related Work                                                   that more explanations of a given product in the form of
In this study, we focus on explanations and transparency of      multiple attributes with corresponding scores (on five star
recommender systems and on the (associated) role of prod-        rating scales), see Figure 1, can provide benefits to poten-
uct attributes mined from product reviews. Here, we discuss      tial customers. In the prototype of the proposed recom-
several related work in these areas.                             mender system, both personalized information (Simgroup
                                                                 ratings: “Users similar to you rate this item as”) and mul-
Product Attributes To understand consumer behavior in
                                                                 tiple product attributes extracted from a review text are
economics, research has focused on the di↵erent attributes
                                                                 added as features. Through an online user study, we apply
and uncertainties that consumers consider when purchasing
                                                                 both novel approaches as controlled variables to the proto-
a product [8, 13]. For buyers, these attributes play impor-
                                                                 type design and investigate the preference of the users to
tant roles when deciding to purchase a product. More im-
                                                                 such features across demographic backgrounds, particularly,
portantly, attributes vary widely across product types and
                                                                 cultural backgrounds (English, Korean and Japanese).
users’ personal tastes. For example, [3] study the e↵ects of
search attributes and provide a comparison between tradi-        4   Interface Design
tional and online supermarkets. A recent study on descrip-
tion and performance uncertainty [4] focused on the diffi-
culty in assessing the product’s characteristics. Building on                                                                                       B
works such as [13] that show advantages of using fine-grained
                                                                       KO Good Moisturizing Lotion
                                                                                       Good Moisturizing Lotion
                                                                                                               당신과 비슷한 사용자의 평점
                                                                                                                                   전체평점
                                                                                                                                                                                                   C
                                                                                                                                                                 mean rating (B~E)


product attributes in the recommendation process, we aim                    JP 상품 리뷰                    好みが似ているユーザによる 合評
                                                                                                Good Moisturizing
                                                                               선블록 냄새가 나지 않는 적당히 가벼운 로션, 괜찮아요!
                                                                                                                  Lotion Customer Reviews
                                                                                                         Users similar to you rate this item as
                                                                                        この商品に          するレビュー

to further our understanding of the role of fine-grained prod-
                                                                               약간 느낌이 두텁기는 하지만 미끈거리지는 않네요. 바르고 나면 피부가 엄청 뽀송뽀송 해

                                                                                   ENサンブロック臭のしない，程よいトロトロ感の良い化粧水です！
                                                                               지진 않지만 2개월 사용Reviews
                                                                                             후에 얼룩덜룩on했던
                                                                                                       this item 고르게 바뀌고 더 밝아졌어요. 확실
                                                                                                         피부톤이
                                                                                                                                                                               simgroup rating (C, E)
                                                                                                                                                                     이 리뷰를 쓰신 분의 세부 평점                      E
                                                                               히 겨울철에濃厚でクリーミーなのに，アブラっぽくない．お肌がメチャクチャソフトになる感じでは
                                                                                     흔하게 생기는 각질이나 자외선으로 인한 손상은 늦춰주는듯 합니다.
                                                                                          Nice, mid-weight lotion with no sunscreen smell                                          보습
                                                                                                                                                                                このレビュー投稿者による項目別評
                                                                                     ないけれど，普通に２ヵ月間使用したら，明らかにお肌のキメが整ってきましたし，明る
uct attribute ratings in consumer decisions.                                                Thick and creamy but not greasy.
                                                                                      더くなってきました．間違いなく冬のお肌の
                                                                                                                             Doesn't make my skin ultra-soft, but after
                                                                                                                        い味方です．                                                      피부탄력
                                                                                        읽기
                                                                                                       A
                                                                                             two months of regular use, my skin tone has very obviously evened out and
                                                                                             glows to boot. Definitely staves off scaly winter skin.
                                                                                                                                                                            안티에이징
                                                                                                                                                                                  This 保
                                                                                                                                                                                       reviewer rate this item as
                                                                                                                                                                                   ハリ· 力


Explanation and Transparency in Recommendation Within
                                                                                             もっと   む                                                                           가격        Moisturizing
                                                                                                                                                                              アンチエイジング
                                                                                                                                                                          천연재료사용      Tightening skin


the recommender systems research community, there is an
                                                                                               Read
                                                                               강추입니다. 역시 이 회사 제품은 제가More
                                                                                                    예상한대로네요.

                                                                               구입 후 리뷰 남깁니다. 상품 패키징 퀄리티가 상당히 좋구요, 최상급 상태로 매우 빠른 배
                                                                                      これ良いです！ さすがはこのブランド，期待通り！
                                                                               송도 마음에 듭니다. 와이프가 며칠 사용해 봤는데 확실히 얼굴에 광택이 달라졌네요. 한
                                                                                                                                                                        D     브랜드
                                                                                                                                                                                     コスト

                                                                                                                                                                                オーガニック
                                                                                                                                                                                향
                                                                                                                                                                                           Anti-aging

                                                                                                                                                                                                 Cost
                                                                                                                                                                                    ブランド
                                                                               번 써볼 만 한 듯 합니다. に素敵なパッケージで送られてきました．本 に到着も早く， 態も良
                                                                                      この商品は本                                                                                                 Organic

increasing understanding of the need for user-centered eval-                                 It's wonderful.
                                                                                      かったです．私の妻は
                                                                                      明るくなるのです．
                                                                                                              What I expected from this brand.
                                                                                                         日間この商品を試したのですが，使用すると明らかに妻の顔が
                                                                                             I was sent this to review for Alina. The product ships in a really nice package,
                                                                                     더 읽기 arrived extremely fast and was in excellent condition.
                                                                                                                                                                                       香り
                                                                                                                                                                                               Brand

                                                                                                                                                                                                Scent

uations [12]. Recent keynote talks [2] and workshops [14]                                    もっと
                                                                                                 My wife gave it a try for a couple days. It puts a noticeable shine on the face.
                                                                                                       む                                                                                   attribute weights (D, E)


have helped to highlight the importance of this topic. In                                             Read More


this paper, we follow Knijnenburg et al.’s [9] argument for
a framework that takes a user-centric approach to recom-
mender system evaluation, beyond the scope of recommen-
dation accuracy. In contrast to that work however, we argue      Figure 1: Screenshots of the interface used in the
that decision quality is an important evaluation metric that     online user study. The annotations A-E show the
goes beyond the user experience metrics described in [9], and    items that varied in each condition, as shown in Ta-
further, that it can be used to explain observed usage pat-      ble 1.
                                                                    We designed a novel user interface for product review
terns for search and recommendation tools. Garcia-Molena
                                                                 pages based on the feedback we received from a preliminary
[6] described di↵erences and similarities between search and
                                                                 user study (N=100). We performed the study with a simple
recommendation, and argued that interactive interfaces can
                                                                 design layout to test the di↵erent visual conditions outlined
help users understand and use these tools in more efficient
                                                                 in Table 1. Participants gave feedback on their preference
ways. Along the same vein, it has also been recognized
                                                                 for each UI in a virtual shopping scenario. They were also
that many recommender systems function as black boxes,
                                                                 required to leave a comment on the interface design. For
providing no transparency into the working of the recom-
                                                                 example, they reported the benefit of the new features, such
mendation process, nor o↵ering any additional information
                                                                 as “I like the level of detail it has related to the product ”,
to accompany the recommendations beyond the recommen-
                                                                 and suggested preferred features, such as “More alive col-
dations themselves [7]. To address this issue, static or in-
                                                                 ors” / “More explanations and ratings”. The collection of
teractive/conversational explanations can be given to im-
                                                                 100 comments were manually assessed, and improvements
prove the transparency and control of recommender sys-
                                                                 were made to the UI, including shortened review text with
tems. Research on textual explanations in recommender
                                                                 “read more” button and breakdown of multiple attributes
systems to date has been evaluated in wide range of do-
                                                                 extracted from the review text on stars. The revised design
mains (varying from movies to financial advice [5]). From a
                                                                 is shown in Figure 1.
cross-cultural perspective, Pu and Chen performed a related
study that evaluated perceptions of di↵erent recommenda-         5   Experimental Setup
tion interfaces in [1], using subjects from Chinese and Swiss
                                                                 Figure 1 shows an example of the refactored interface for
1
    http://polishcosmetics.pl/Korean-Market-Analysis.pdf         a sample product review. To test our hypotheses above, a
                     Table 1: Overview of the controlled variables for the online user study.
     UI Config                non-personalized                            personalized
                              (no information from similar users)         (with social data from similar users)
     review text only                                         A: product review text
     review text with star B: A + mean rating on stars                    C: A + mean rating and the rating from
     rating                                                               active user’s simgroup on stars
     review text, star rating D: B + attribute weights computed from E: C + attribute weights computed from
     and attributes           current review text (on stars)              current review text (on stars)


                                                                                                                              6
                                                                                                                                                                                                                                                                              country

                                                                                                                                                                                                                                                                                  English


                                                                                                                      value
                                                                                                                              4
                                                                                                                                                                                                                                                                                  Japanese

                                                                                                                                                                                                                                                                                  Korean

                                                                                                                              2


                                                                                                                                    Text Only(A)   NP−Star(B)                                     P.Star(C)    NP−Attr(D)   P−Attr(E)
                                                                                                                                                                                                  variable
  Figure 2: Preferred User Interface by Culture.                  Figure 4: Cross-cultural perspective of helpfulness
                                                                  of the five evaluated interfaces.


                                                                  Mean Rating of Frown (5 point Likert Scale)


                                                                                                                                                                                                                                Mean Rating of Smile (5 point Likert)
                                                                                                                                                           Neutral Expression (5 point Likert)
                                                                                                                                                                                                 4.4


                                                                                                                                                                                                                                                                        4.8
                                                                                                                1.9


                                                                                                                                                                                                 4.0


                                                                                                                                                                                                                                                                        4.4
                                                                                                                1.7


                                                                                                                                                                                                 3.6


                                                                                                                                                                                                                                                                        4.0
     Figure 3: Preferred User Interface by Age.
                                                                                                                1.5


                                                                                                                                                                                                 3.2


                                                                                                                                                                                                                                                                        3.6
                                                                                                                                  n=53       n=23                                                       n=53       n=23                                                        n=53         n=23

3x2 within subjects experiment was conducted, controlling                                                                         Female     Male                                                      Female      Male                                                       Female        Male
                                                                                                                                      Gender                                                                  Gender                                                              Gender
for personalization, and rating type, as shown in Table 1.
The study (N=150) was performed on the crowdsourcing              Figure 5: Di↵erence in rating propensity by gender.
platform, Amazon Mechanical Turk. Each participant was
shown a randomly ordered set of 5 di↵erent design layouts
                                                                  accordingly, an increased need to explore user ratings on fine
corresponding to the treatments in Table 1, and were asked
                                                                  grained product attributes (see Figure 3).
to rank them in order of preference. They were also asked
to rate the helpfulness of each. Participants were evenly         Personalization and Rating Type Figure 4 shows the re-
balanced across cultural backgrounds. All participants were       sults of perceived usefulness of the interfaces, broken down
shown with the five interfaces in random order. The content       by cultural groupings. Each UI condition is shown as a group
was shown in their primary language based on their cultural       on the x-axis, and each group contains the mean utility score
background. Overall, participants took between 5-10 min-          for the three cultural groups. The x-axis groups (UI treat-
utes doing the study, and were paid $1.50 for their time.         ments) are also ranked from left to right based on number of
Questions were added to test for user attention level and for     visible features (UI complexity). This graph shows several
language proficiency, including identification of di↵erences      interesting e↵ects: first, there is a general preference across
between UIs and simple math questions written in the ap-          all groups for the attribute-based representations (groups D
propriate language. After filtering our data based on these       and E, on the right side), over less granular, star-ratings
metrics, group sizes were 39, 25 and 12 for English, Japanese     or text-based UIs. This is a promising result that indicates
and Korean, respectively. Participant age ranged between          that attribute extraction and visualization has a positive ef-
18-64 with an average of 26. Gender groups were not evenly        fect on Ux. The second interesting result is that within the
distributed, as expected for the cosmetics domain, with 70%       star-rating group (2nd and 3rd group) and the attribute-
female and 30% male.                                              rating (4th and 5th) groups there is no notable di↵erence
6 Results                                                         between the personalized and non-personalized treatments.
                                                                  This result tells us that the granularity of presented ratings
Perception and Rating Differences Figure 2 shows the              has more positive impact on user experience than the percep-
results for the UI ranking task, broken down by age. The          tion that the ratings come from similar users. To investigate
result shows a clear preference for design E in all groups, but   this result in more depth, a followup experiment is planned
there is a significant increase in that preference for partici-   with a large corpus of product reviews collected from Ama-
pants over 40 (shown on the right side). This e↵ect was also      zon.com [11] 2 to compute actual similarity scores based on
seen from 100 participants in the preliminary study. Inter-       user profiles. This would clearly give better insight into the
face E, shown in Figure 1, shows the most information, and        observed e↵ect. Figure 4 also answers R2, in that there are
allows users to understand how users similar to them rate         no significant di↵erences between the cultural groups within
individual product attributes. This e↵ect might be a result
                                                                     2
of specific preferences for cosmetics developing with age, and                                        http://jmcauley.ucsd.edu/data/amazon/
                                                                  Mean of Neutral Expression (5 point Likert)
                                                                                                                                                                                                            step is to evaluate on real product data. The authors plan a
Mean Rating of Frown (5 point Likert)


                                                                                                                                          Mean Rating of Smile (5 point Likert)
                                                                                                                                                                                                            follow-up study to compare LDA and dictionary-based ap-


                                                                                                                4.5
                                                                                                                                                                                                            proaches to product attribute extraction, and to explore how
                                        2.5


                                                                                                                                                                                  5.0
                                                                                                                                                                                                            the resulting attributes can improve explanations, and user


                                                                                                                4.0
                                        2.0


                                                                                                                                                                                                            profiles for collaborative filtering. Additionally, a more de-


                                                                                                                                                                                  4.6
                                                                                                                                                                                                            tailed evaluation of the di↵erent rating propensities across
                                        1.5


                                                                                                                3.5
                                                                                                                                                                                                            cultures is underway using a larger number of participants


                                                                                                                                                                                  4.2
                                                                                                                                                                                                            and multiple product domains.
                                        1.0


                                                                                                                3.0
                                              n=39 n=25 n=12                                                          n=39 n=25 n=12                                                    n=39 n=25 n=12
                                              En     Jp      Ko                                                       En     Jp      Ko                                                 En     Jp      Ko
                                                   Culture                                                                 Culture                                                           Culture        8    References
                                                                                                                                                                                                             [1] L. Chen and P. Pu. A cross-cultural user evaluation of product
                                                                                                                                                                                                                 recommender interfaces. In Proceedings of the 2008 ACM
Figure 6: Di↵erence in rating propensity by culture.                                                                                                                                                             Conference on Recommender Systems, RecSys ’08, pages
                                                                                                                                                                                                                 75–82, New York, NY, USA, 2008. ACM.
                                                                                                                                                                                                             [2] E. H. Chi. Blurring of the boundary between interactive search
each UI treatment, although the Japanese showed a trend                                                                                                                                                          and recommendation. In Proceedings of the 20th International
towards favoring the more complex UI treatments.                                                                                                                                                                 Conference on Intelligent User Interfaces, pages 2–2. ACM,
                                                                                                                                                                                                                 2015.
Rating Propensity For some users, rating an item with a                                                                                                                                                      [3] A. M. Degeratu, A. Rangaswamy, and J. Wu. Consumer choice
specific number of stars can have very di↵erent meanings.                                                                                                                                                        behavior in online and traditional supermarkets: The e↵ects of
User ratings on items serve as the basis for most collabo-                                                                                                                                                       brand name, price, and other search attributes. International
                                                                                                                                                                                                                 Journal of research in Marketing, 17(1):55–78, 2000.
rative recommendation techniques, but they tend to ignore                                                                                                                                                    [4] A. Dimoka, Y. Hong, and P. A. Pavlou. On product uncertainty
such di↵erences when computing neighborhoods for recom-                                                                                                                                                          in online markets: Theory and evidence. Mis Quarterly, 36,
mendation. Further, little work has been done to under-                                                                                                                                                          2012.
stand cross-cultural di↵erences in rating propensity. Since                                                                                                                                                  [5] A. Felfernig, E. Teppan, and B. Gula. Knowledge-based
                                                                                                                                                                                                                 recommender technologies for marketing and sales. Int. J.
these participant groupings were available our experimen-                                                                                                                                                        Patt. Recogn. Artif. Intell., 21:333–355, 2007.
tal setup, a logical step was to evaluate rating propensities                                                                                                                                                [6] H. Garcia-Molina. Thoughts on the future of recommender
within each of the cultural groups, to serve as both an in-                                                                                                                                                      systems. In Proceedings of the 8th ACM Conference on
                                                                                                                                                                                                                 Recommender Systems, pages 1–2. ACM, 2014.
dependent result, and as a weighting factor for the analysis
                                                                                                                                                                                                             [7] J. L. Herlocker, J. A. Konstan, and J. Riedl. Explaining
in Figure 4. Each participant was shown three randomly                                                                                                                                                           collaborative filtering recommendations. In ACM conference on
ordered faces, showing expressions with happy, neutral and                                                                                                                                                       Computer supported cooperative work, pages 241–250, 2000.
sad expressions. They were asked to rate the ‘happiness’ per-                                                                                                                                                [8] Y. Kim and R. Krishnan. On product-level uncertainty and
                                                                                                                                                                                                                 online purchase behavior: An empirical analysis. Management
ceived in each on a five point Likert scale. Figure 5 shows                                                                                                                                                      Science, 61(10):2449–2467, 2015.
the results by gender (for all groups). Interestingly, there                                                                                                                                                 [9] B. P. Knijnenburg, M. C. Willemsen, Z. Gantner, H. Soncu,
is a trend for Females to rate higher than males, and the                                                                                                                                                        and C. Newell. Explaining the user experience of recommender
di↵erence becomes more pronounced for the ‘happy’ expres-                                                                                                                                                        systems. User Modeling and User-Adapted Interaction,
                                                                                                                                                                                                                 22(4-5):441–504, 2012.
sion, shown on the rightmost plot of Figure 5 with a mean                                                                                                                                                   [10] Y. Matsunami, M. Ueda, S. Nakajima, T. Hashikami,
di↵erence of 0.7 (relative increase of 16%, p<0.005). Figure                                                                                                                                                     S. Iwasaki, J. O’Donovan, and B. Kang. A method for
6 shows the results of the rating propensity analysis broken                                                                                                                                                     automatic scoring of various aspects of cosmetic item review
                                                                                                                                                                                                                 texts based on evaluation expression dictionary. In Proceedings
down by cultural group. Again, the graphs represent mean                                                                                                                                                         of the 24th International MultiConference of Engineers and
rating for sad, neutral and happy expression ratings from left                                                                                                                                                   Computer Scientists, IMECS ’16, pages 392–397. IAENG,
to right, respectively. Here, we see a clear trend for higher                                                                                                                                                    2016.
ratings in the Japanese group across all three expressions.                                                                                                                                                 [11] J. McAuley and A. Yang. Addressing Complex and Subjective
                                                                                                                                                                                                                 Product-Related Queries with Customer Reviews. ArXiv
While this is only a small-scale initial study, we believe that                                                                                                                                                  e-prints, Dec. 2015.
this is an important result for the study of recommender                                                                                                                                                    [12] S. M. McNee, J. Riedl, and J. A. Konstan. Being accurate is
system performance across di↵erent cultures in general, and                                                                                                                                                      not enough: How accuracy metrics have hurt recommender
a follow-up study on propensity of ratings for recommender                                                                                                                                                       systems. In Extended Abstracts of the 2006 ACM Conference
                                                                                                                                                                                                                 on Human Factors in Computing Systems (CHI 2006), 2006.
systems is planned to investigate this further.                                                                                                                                                             [13] J. O’Donovan, B. Smyth, V. Evrim, and D. McLeod.
                                                                                                                                                                                                                 Extracting and visualizing trust relationships from online
7 Discussion and Future Work                                                                                                                                                                                     auction feedback comments. In IJCAI, pages 2826–2831, 2007.
                                                                                                                                                                                                            [14] J. O’Donovan, N. Tintarev, A. Felfernig, P. Brusilovsky,
This study applied a 3 by 2 within subjects experimental de-                                                                                                                                                     G. Semeraro, and P. Lops. Joint workshop on interfaces and
sign in a user study (N=150) to evaluate e↵ects of UI design                                                                                                                                                     human decision making for recommender systems (intrs). In
and personalization on a range of user experience metrics                                                                                                                                                        H. Werthner, M. Zanker, J. Golbeck, and G. Semeraro, editors,
in a cosmetics shopping scenario using participant groups                                                                                                                                                        RecSys, pages 347–348. ACM, 2015.
                                                                                                                                                                                                            [15] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl.
from three di↵erent cultural backgrounds. Results of the                                                                                                                                                         Grouplens: An open architecture for collaborative filtering of
study show that 1) Korean and Japanese speakers chose the                                                                                                                                                        netnews. In Proceedings of ACM CSCW’94 Conference on
most complex UI more often than English speakers. 2) older                                                                                                                                                       Computer-Supported Cooperative Work, pages 175–186, 1994.
participants also preferred more options in cosmetic product
selection, regardless of cultural background. 3) personaliza-
tion of product ratings did not show an e↵ect on user expe-
rience. 4) attribute-based explanations were preferred over
star-ratings for all three cultures. 5) Rating propensity eval-
uation showed that Japanese had significantly higher ratings
than Korean or English, and that Females provided higher
ratings than Males, regardless of background. A clear next-