=Paper=
{{Paper
|id=Vol-2974/paper
|storemode=property
|title=Addressing Overchoice: Automatically Generating Meaningful Filters from Hotel Reviews
|pdfUrl=https://ceur-ws.org/Vol-2974/paper4.pdf
|volume=Vol-2974
|authors=István Varga,Yuta Hayashibe
|dblpUrl=https://dblp.org/rec/conf/rectour/VargaH21
}}
==Addressing Overchoice: Automatically Generating Meaningful Filters from Hotel Reviews==
<pdf width="1500px">https://ceur-ws.org/Vol-2974/paper4.pdf</pdf>
<pre>
  Addressing Overchoice: Automatically Generating Meaningful Filters from
  Hotel Reviews

  ISTVÁN VARGA, Megagon Labs, Tokyo, Japan, Recruit Co., Ltd., Japan
  YUTA HAYASHIBE, Megagon Labs, Tokyo, Japan, Recruit Co., Ltd., Japan
  In this paper we present a hotel filter recommendation method designed to address the cognitive load users face in an overchoice
  scenario. As online products and services are continuously diversifying, user needs are also becoming increasingly sophisticated.
  However, with more items to choose from, grasping the entire choice set and differentiating among all matching options becomes
  increasingly difficult, leading to sub-optimal outcomes. Conventional hotel reservation platforms provide with a limited set of additional
  filters, but these can not accommodate all intricate user needs. Employing natural language processing and machine learning techniques,
  we provide a simple framework that identifies meaningful filters from customer reviews. We define criteria and scoring methods to
  acquire relevant and interesting filters that may help customers refine their needs or even identify hidden, previously unknown ones.
  Our simulated user experiments show that our proposal is capable of identifying intricate and useful filters, leading to increased
  customer satisfaction.

  CCS Concepts: • Computing methodologies → Machine learning; • Information systems → Content ranking; Recom-
  mender systems; Rank aggregation; Similarity measures.

  Additional Key Words and Phrases: overchoice, clustering, filter recommendation


  1   INTRODUCTION
  Online services have become not only ubiquitous, but indispensable in almost every aspect of our life. Nearly every
  imaginable product or service is available through e-commerce transactions, including online shopping, restaurant
  or hotel reservation to matchmaking. In a conventional hotel reservation service the customer is provided with an
  interface that facilitates search using some of the most crucial criteria, typically objective queries that are meant to
  reduce the choice set to a manageable size (e.g., number of visitors, length of stay, location, etc.).
      The emergence of e-commerce systems or online reservation services brought forward the advantage of having an
  increased selection at the convenience of only a few clicks away. Both classic economics and psychology emphasize the
  benefits of a larger number of choices [34, 41, 42]. However, it also raised a number of important challenges as well. One
  such challenge is that the size of the choice set can be a cognitive load in the decision making process. Overchoice, or
  having too many choices, can be detrimental, leading to anxiety or depression [25, 45, 48]. Even though a larger number
  of choices is initially appealing, the consumers may feel less satisfied or convinced that they actually made the best
  decision available [26]. Recent studies even suggest an inverted U-shaped relationship between customer commitment
  and the number of available choices [46], with customers being more likely to find an item to their liking with the
  growing number of choices, but starting to have difficulties when multiple items fit their needs.
      Furthermore, with continuous product diversification, user queries are also becoming even more refined, contributing
  to customer satisfaction and self-satisfaction being increasingly difficult to achieve [16, 44]. To address the customers’
  refined expectations, hotel reservation services provide faceted search functions, e.g., additional filters (e.g., free breakfast
  or late check-out ), sets of objective options that serve as potential additional queries to reduce the choice set. Such

  Authors’ addresses: István Varga, istvan@megagon.ai, Megagon Labs, Tokyo, Japan, Recruit Co., Ltd., 7-3-5 Ginza Hulic GINZA7 Bld 3F Chuo-ku, Tokyo,
  Japan, 104-8227; Yuta Hayashibe, hayashibe@megagon.ai, Megagon Labs, Tokyo, Japan, Recruit Co., Ltd., 7-3-5 Ginza Hulic GINZA7 Bld 3F Chuo-ku,
  Tokyo, Japan, 104-8227.


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                        István Varga and Yuta Hayashibe


            filters range from being just a handful of pre-defined, static options to sometimes even thousands of carefully curated
            ones over the course of several years [3]. However, continuously updating such a set of filters as a response to product
            diversification and customer expectation can be extremely costly. Moreover, navigating through a large set of filters can
            even become a burden, defeating its very own purpose [3].
                In our work we provide a simple framework for automatically acquiring filters related to the hotels that match
            the customer’s initial query, by identifying useful mentions from customer reviews. We especially focus on customer
            experiences that are potentially both relevant and interesting for other customers, while also having the capability of
            reducing the choice set in an intuitive and natural way.
                The main contributions of this paper are the following:


                (1) In order to address the overchoice problem, we present a simple clustering based approach to identify useful
                     filters in a dynamic manner, from customer reviews.
                (2) We define key concepts and strategies in scoring and ranking filters that are meaningful and natural for the
                     customer.
                (3) We present simple but efficient methods to implement filter scoring and ranking.
                (4) We validate our proposal through a series of user experiments. We found that subjective, experience based filters
                     that express quality judgements were especially useful for potential users to narrow down the search space.


                The paper is organized as follows: in Section 2 we discuss the related work, followed by the definition of key concents
            of our approach in Section 3, data description in Sections 4 and details of our proposal in Section 5. We describe our
            experiments in Section 6, followed by discussions with future directions in Section 7 and the concluding remarks in
            Section 8.


            2    RELATED WORK
            Automatic facet generation is a closely related field to our task. Faceted search augments traditional search by presenting a
            set of attributes or filters that are grouped into facets, allowing customers to narrow down the search results [8, 10, 22, 37].
            Manual curation and continuous updating of facets can be extremely costly [3], thus automatic methods to identify
            and rank filters have been proposed [18, 27]. Our work differs in three main aspects from automatic facet generation.
            Firstly, facet generation methods employ knowledge bases to maintain a well organized structure of facets [18, 27]. Our
            method does not employ structured knowledge bases, instead, we rely only on customer reviews. Secondly, faceted
            search typically targets objective filters to populate facets. Our work, besides objective filters, identifies subjective filters
            as well, crucial in expressing unique experiences that might be of value for new potential customers. Thirdly, compared
            to faceted search, our method puts an emphasys on addressing overchoice. The explanatory search nature of faceted
            search does address overchoice, but sometimes navigation through a large set of facets becomes a burden in itself,
            defeating its very own purpose [3]. Our method has the option of providing only a handful of potentially meaningful
            and unique filters that can reduce the choice set, without putting extra burden on the customer.
                As another method to reduce information overload, customer review summarization is also related to our field
            [9, 11, 23, 38]. Our work mainly differs from review summarization in that we attempt to identify filters that are common
            across multiple items, whereas review summarization mainly focuses on identifying main characteristics of individual
            items.


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
  Addressing overchoice: automatically generating meaningful filters from hotel reviews


      Customer reviews have also been the target of sentiment analysis [4], aspect based opinion mining [43, 54], feature
  based ranking [53]. Similarly with review summarization, these methods focus on the reviews of single items, as
  opposed to identifying common, but meaningful characteristics across multiple items.
      Also, customer reviews can be employed to generate recommendations [17, 47]. These methods rely on customer
  logs and information extracted from reviews to recommend items that are similar to previously liked ones. Our work
  does not imply the existence of previous customer logs.
      Published work on query suggestion and recommendation has been prominently focused on the web domain
  [2, 24, 31], with recent focus on e-commerce product search [19] or news related content [12]. Typically these works
  employ knowledge bases [19, 24] or customer action logs [2, 24, 31] to suggest queries that are relevant to the original
  user query. Our method differs in two key aspects. First, our target is not to suggest similar queries or filters, instead,
  we attempt to provide useful filters that are not restricted to being related to the original user query. Second, we only
  utilize customer reviews, without the employment of knowledge bases or customer action logs.
      Related to query recommendation is the field of query rewriting, the task which aims to reformulate customer
  queries into well-formed ones, in order to improve customer experience [50, 52]. It differs from our work in that query
  rewriting does not attempt to recommend new filters or queries to the customer.
      Interestingness or uniqueness discovery, key concepts in our work, is another related field, with special focus on
  news articles [14, 28, 36], but definitions of uniqueness are often contain heavily domain dependent elements, such
  as article freshness [14, 28] or differences in events that occur before and after publication [36], not applicable in our
  domain. A more robust method is presented in [39], where authors define interestingness of articles as a combination
  of multiple features, such as topic relevancy, source reputation, writing style or freshness. The main difference from our
  work is that our target for uniqueness are simple sentences, rather than full articles.
      On a note, the field of anomaly detection [6, 7] is also related to the concept of interestingness. However, unique or
  interesting in our context does not go as far as being abnormal, as in Hawkins’s [20] definition of outlier1 .


  3     KEY CONCEPTS
  Our goal is to automatically identify filters that are characterized by: (1) being appealing to the customer; (2) having the
  potential of addressing the overchoice problem by reducing the choice set in an intuitive and natural way. We define a
  filter “appealing” as having the quality of being both relevant and unique. Also, we define a filter set “appealing” as
  being diversified, without too much emphasis on a single topic or aspect. Furthermore, to perform choice set reduction
  in an intuitive way, we introduce size control policies.
      Relevance, uniqueness, diversity and size control are key concepts of our proposal. We employ size control policies
  and diversity rules as hard constraints to identify possible filters, while using relevance and uniqueness scores to
  determine the final filter ranking.


  3.1    Relevance
  Filters are required to hold enough decision power in order to be viable expressions of user intent. Pre-defined static
  filters of conventional hotel reservation platforms are good examples of high relevance (e.g., breakfast included, late
  check-out). We attempt to assign relevance scores to all possible filters. While relevance is highly subjective, we can


  1 “an observation that deviates so significantly from other observations as to arouse suspicion that it was generated by a different mechanism”


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                      István Varga and Yuta Hayashibe


            argue that all else being equal, certain filters satisfy a larger audience than others (e.g., close to the city center versus
            bright pink curtains). Detailed information about relevance scoring can be found in Section 5.2.1.

            3.2     Uniqueness
            Filters are also required to be representative of the choice set that matches the customer’s original query, capturing
            characteristics that are unique within the search results. The motivation behind uniqueness is to identify options that
            are especially appealing within the hotels that already match the user query (e.g., next to the city aquarium), with the
            added potential to offer choices previously unknown by the customer (e.g., private hot-spring). Section 5.2.2 offers
            detailed description on uniqueness scoring.

            3.3     Diversity
            The importance of diversity and serendipidity is well recognized in the context of recommender systems [5, 30, 33].
            Studies also point out that decision making factors are sometimes not even part of the original query [1]. As a result,
            we argue that, especially in cold start situations, a diversified set of filters that covers a wide range of topics is more
            suitable to accommodate customer needs, than filters biased towards one or more topics. More information about our
            approach in acquiring a diversified set of filters can be found in Section 5.1.

            3.4     Size control
            By definition, filters are designed to address overchoice and reduce the choice set, i.e., the number of matching hotels.
            We argue that the degree of the size reduction is also important. Providing highly appealing, but too generic or too
            specific filters might result in a too drastic or too shallow choice set reduction, leading to customer dissatisfaction.
            Instead, our strategy is to identify and provide only the filters that are guaranteed to result in a “just right” window of
            matching hotels, compared to the original number of matching hotels. Naturally, this implies that our filters are based
            on availability, i.e., filters that obey size control rules are guaranteed to reduce the choice set.
                Intuitively, in practice this should provide a natural way in reducing the choice set, balancing between relevance
            and uniqueness. However, more often than not, relevance and uniqueness work against each other. Highly relevant
            filters are often not very unique (e.g., free continental breakfast), while highly unique filters may not be relevant to a
            large audience (e.g., stay at a buddhist temple). When the choice set is large, arguably it is more natural to select from
            more generic, thus high relevance, low uniqueness filters, with the preference shifting towards high uniqueness, low
            relevance filters with a decreasing choice set. With a large choice set, size control policies rule out filters that are not
            frequent enough, thus disregarding long-tail, but unique filters, with higher relevance ones gaining more prominence.
            With a decreasing choice set, long-tail, unique filters should gain more exposure at the expense of more generic, relevant
            filters. More information about size control policies can be found in Section 5.1.2.

            4     HOTEL REVIEWS AS DATA SOURCE
            As our data source we use over 20 million sentences extracted from hotel reviews, collected from one of the largest hotel
            booking sites in Japan2 . The hotel review corpus contains the customer review texts and the location data associated to
            each hotel.

            2 jalan.net


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
  Addressing overchoice: automatically generating meaningful filters from hotel reviews

                                       Table 1. Predicate argument structures and their extracted cores


        Core                      Original predicate-argument structure
                                  very delicious food, really delicious food, all food is delicious, food is of course delicious,
        delicious food            delicious food as advertised, more than delicious food
                                  hotel is close to the station, really close to the station, close to the station as mentioned,
        close to the station      closest to the station, pretty close to the station
                                  extremely clean rooms, very clean rooms, rooms clean as always, rooms are of course
        clean rooms               clean, thoroughly cleaned rooms, rooms cleaned to the last detail


  4.1     Filter units
  An underlying assumption of our method is that a user friendly filter extracted from customer reviews can be represented
  by a simple predicate-argument structure. To this end, we extracted over 20 million predicate-argument structures
  from our corpus by using JUMAN++ (v2.0.0-rc3), a Japanese morphological analyzer [49] and KNP++ (v0.9-21cc58c),
  a Japanese dependency and case structure analyzer [29]. We modified the case structure analyzer in order to retain
  only the core arguments of the predicates, discarding subtle nuances (e.g., modifiers, adverbs, adjectives, adverbial or
  adjective phrases, etc.) that are not relevant in the context of user friendly filters. To this end, we retained the arguments
  that mark the most essential Japanese grammatical cases: nominative, accusative, dative, instrumental, and the Japanese
  topic marker3 . Table 1 illustrates some examples of core predicate argument structures together with their original
  form before the discarding process.
     Some of the resulting predicate-argument structures were unrelated to hotels or had negative polarities, unsuitable
  for our filter policies. As a result, we employed a filtering method based on the automatic classification results of
  two BERT-based classifiers fine-tuned with an annotated corpus4 [21] to identify non-negative predicate-argument
  structures relevant to the hotel or its services. As pre-trained model we used a BERT model trained on our hotel review
  corpus. For more information about our BERT model refer to Section 4.2. Finally, we retain core predicate-argument
  structures whose frequency is at least 5 in our corpus. As a result of the above processes, we retained 167,886 unique
  non-negative core predicate-argument structures.

  4.2     Filter representation
  To represent filters, we pre-trained a BERT [15] model on our review corpus. Here we followed the methodology
  described in [21]. The authors in [21] employ SentencePiece [32], an unsupervised text tokenizer which learns sentence
  units for a predetermined vocabulary size. We set the vocabulary size to 32,000. To train the BERT model, we used the
  parameter values officially distributed with BERTBase. We set the batch size to 512, the number of attention heads to
  12, the number of layers to 12, and the number of hidden layers to 12. We trained the BERT model for 1,500,000 steps
  using TPUs.
     To improve on BERT’s embeddings, we employed the sentence embedding framework described in [40], using the
  triplet loss function to fine-tune our pre-trained model. The triplet-loss function requires a triplet of (anchor, positive,
  negative) sentences where the (anchor, positive) tuple is a positive pair, while the (anchor, negative) tuple is a negative
  pair. As input triplets for fine-tuning our pre-trained model, we employed a simple tf-idf based word2vec sentence

  3 We retained the arguments that were marked by the Japanese particles ga, wo, ni, de and ha.
  4 https://github.com/megagonlabs/jrte-corpus


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                           István Varga and Yuta Hayashibe


            representation described in [35] for each filter, randomly selecting 30,000 triplets whose (anchor, positive) pair had a
            cosine similarity larger than 0.85, and whose (anchor, negative) cosine similarity was smaller than 0.20.

            5     PROPOSED METHOD
            We developed machine learning based methods to identify and rank filters. Given a set of hotels that match an initial
            set of original user queries, we automatically extract the non-negative core predicate-argument structures described in
            Section 4.1 from the customer reviews of the matching hotels. These core predicate-argument structures will act as
            potential filters. First, using the sentence embedding representations of the filters, we apply a 2 staged hierarchical
            clustering method to group them into semantically similar clusters. In this step we employ policies to identify clusters
            that follow size control restrictions. Next, we score each cluster for relevance and uniqueness to determine the final filter
            class ranking. In this step we employ diversity policies. Also in this step we label the top ranked filter clusters. Below is
            a detailed description of each step.

            5.1     2-stage clustering
            5.1.1 Stage 1: main topic identification. In the first stage of clustering we attempt to group filters into main latent
            topics, e.g., food, location, hot spring, etc. The purpose is to serve diversity by identifying such latent topics, with
            the assumption being that filters from different clusters after stage 1 will roughly have different topics5 .
                In order to identify the main topics, we employ Ward’s agglomerative clustering method [51] with complete linkage
            and cosine similarity as metric. As feature representation for the filters, we used the sentence embeddings described
            in Section 4.2. Empirical results showed that a similarity threshold of 0.5 resulted in latent topic clusters with a good
            trade-off between inter-clusters homogeneity and intra-cluster variance.
                We recognize that a carefully curated knowledge-driven approach may have the advantage in accurately associating
            pre-defined topics to filters. However, besides cost issues, our data-driven approach has the advantage of recognizing a
            potentially infinite number of intrinsic topics that would be difficult to manually acquire.
                Also note that the purpose of the first stage is solely to associate filters with latent topics, thus this step can be
            performed beforehand, independently of user queries.

            5.1.2    Stage 2: filter identification with size control. In the second clustering stage we identify filter clusters from each
            main topic from Section 5.1.1 whose size obey size control rules. The size of a filter cluster is defined as the total number
            of hotels that the members of the cluster are linked to. Size control is governed by two parameters, 𝑙𝑜𝑤𝑒𝑟 _𝑏𝑜𝑢𝑛𝑑 and
            𝑢𝑝𝑝𝑒𝑟 _𝑏𝑜𝑢𝑛𝑑 that represent the lower bound percentage and upper bound percentage, respectively, in respect to the
            size of the original choice set.
                To achieve this, for each topic output by the main topic identification step described in Section 5.1, we parse the
            hierarchical subtree of each topic by incrementally moving up in the cluster hierarchy. During this process we retain
            clusters that obey size control rules and stop where the linkage drops below a certain similarity threshold, empirically
            set to 0.7. Empirically, we set 𝑙𝑜𝑤𝑒𝑟 _𝑏𝑜𝑢𝑛𝑑 and 𝑢𝑝𝑝𝑒𝑟 _𝑏𝑜𝑢𝑛𝑑 to 30% and 70%, respectively.

            5.2     Filter scoring
            We score and rank filter clusters retrieved in the 2-stage clustering step based on their relevance and uniqueness. After
            ranking, we apply diversity rules and label the top 𝐾 filter clusters as described below.
            5 Note that we do not attempt to label the resulting topics. Instead, we only attempt to identify filter groups that belong to the same latent topic.


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
  Addressing overchoice: automatically generating meaningful filters from hotel reviews

                                                Table 2. Relevance examples in the training data


                                            Filter                                       Relevance score
                                            rentable private open-air bath                        5
                                            delicious dinner                                      4
                                            great view                                            3
                                            television available in the rooms                     2
                                            good water pressure                                   1


  5.2.1 Relevance score. In Section 3.1 we stressed the importance of discovering filters that are crucial enough in the
  decision making process. Determining relevance is a non-trivial task, since people’s preferences are obviously not
  uniform. A filter that may be highly relevant for one customer, may be less relevant for another one (e.g., rich choice of
  baby formula or free pair ticket to the city aquarium), depending not only on personal preferences, but also on situations
  or even purpose of visit.
     Since this is a cold start scenario, personalized relevance estimators based on customer action logs are not feasible.
  Instead, we define the relevance of a filter independently from the original user query, as the average of multiple
  subjective relevance scores.
     We pre-computed filter relevance scores using a simple k-nearest neighbor classifier. For each filter we took the top
  𝑘 = 5 similar filters from our training data, and computed their average relevance, weighted by the similarity score,
  as shown in the below formula, where 𝑥 denotes the target filters, 𝑥𝑖 denote filters of the training data, relevancegold
  denotes gold relevance scores of the training data.
                                                              Í𝑘
                                                                𝑖=1 cossim(𝑥, 𝑥𝑖 ) × relevancegold (𝑥𝑖 )
                                          relevance(𝑥) =                 Í𝑘                                                        (1)
                                                                          𝑖=1 cossim(𝑥, 𝑥𝑖 )
     As similarity score we employed cosine similarity, computed on the sentence embeddings described in Section 4.2.
  We normalized the relevance score by scaling it to between 0 and 1.
     As training data we randomly selected 8000 filters and asked 5 crowd workers to label their degree of relevance from
  5 to 16 . The most relevant was labeled with 5, the least relevant being 1. Ungrammatical or semantically unsound filters
  were labeled with 0. We calculated pairwise inter-annotator agreement using Weighted Cohen’s kappa [13]. Kappa
  values were between 0.24 and 0.56, representing fair to moderate agreement, underlying the highly subjective nature of
  the task.
     For our classifier we used the filters that were judged as grammatically correct by at least 4 out of the 5 workers. For
  the grammatically correct filters we averaged the individual worker scores. We preferred to use truncated mean (e.g.,
  ignoring the lowest and highest scores of the 5 workers) in order to counter for highly subjective relevance scores (e.g.,
  rich choice of baby formula). Table 2 shows an excerpt of the filters and their respective averaged relevance scores.
     We evaluated our relevance classifier on a held-out data of 1000 samples by calculating the precision on increasing
  error ranges. We achieved a precision of 60.40% when the error range between the estimated relevance and reference
  relevance was less or equal than 0.1 points, and 85.50% precision at 0.2 points error range, as shown in Figure 1.


  6 We manually selected the crowd workers based on their demographic information (i.e., gender, age range) to ensure diversity.


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                          István Varga and Yuta Hayashibe


                                                  Fig. 1. Relevance evaluation: cumulative error range.


               We compute relevance scores for filter clusters as the weighted average relevance of its member filters as shown
            in the below formula, where 𝐶𝑥 denotes filter clusters, relevance(𝑥) denotes filter relevance and freq(𝑥) denotes the
            frequency of filter 𝑥 in the target choice set.
                                                                     Í
                                                                       𝑥 ∈𝐶𝑥 freq(𝑥) × relevance(𝑥)
                                                 relevance(𝐶𝑥 ) =            Í                                                        (2)
                                                                                𝑥 ∈𝐶𝑥 freq(𝑥)

            5.2.2 Uniqueness score. We define the uniqueness of a filter as the property of being important within a selected group
            of hotels. We employed term frequency–inverse document frequency (tf-idf) as uniqueness of each filter, where 𝑥
            denotes the filter, 𝑑 denotes the reviews of a specific hotel.


                                                        uniqueness(𝑥, 𝑑) = tf (𝑥, 𝑑) × idf (𝑥)                                        (3)
               Intuitively, a unique characteristic of a subset is more dominant in the subset than within the entire population. The
            sparse nature of the filters makes it unfeasible to handle them individually, thus for the purpose of computing tf-idf, we
            employed Ward’s agglomerative clustering method [51] with complete linkage and cosine similarity as metric, with a
            similarity threshold of 0.7 to group together filters of similar semantic properties. As a result, we clustered the filters
            into 5178 clusters and we computed the tf-idf scores on the resulting clusters. Cluster members inherited the tf-idf
            values of their parent cluster.
               Similarly to relevance scores, we computed filter cluster uniqueness score as the weighted average uniqueness of
            its member filters as shown in the below formula, where 𝐶𝑥 denotes filter clusters, uniqueness(𝑥, 𝑑) denotes filter
            uniqueness for filter 𝑥 in hotel review set 𝑑, freq(𝑥) denotes the frequency of filter 𝑥 in the target choice set.
                                                                     Í
                                                                       𝑥 ∈𝐶𝑥 freq(𝑥) × uniqueness(𝑥, 𝑑)
                                              uniqueness(𝐶𝑥 , 𝑑) =              Í                                                     (4)
                                                                                  𝑥 ∈𝐶𝑥 freq(𝑥)

            5.2.3 Filter ranking and employing diversity rules. Filter cluster ranking is determined by multiplying the filter cluster’s
            relevance and uniqueness scores, weighted by their respective weights (i.e., 𝛼 and 𝛽 for relevance and uniqueness,


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
  Addressing overchoice: automatically generating meaningful filters from hotel reviews


  respectively), as shown in the below formula. During preliminary empirical evaluations, we found that a reasonable
  value for 𝛼 and 𝛽 were 1 and 2, respectively.


                                          rank(𝐶𝑥 ) = (𝛼 + relevance(𝐶𝑥 )) × (𝛽 + uniqueness(𝐶𝑥 ))                                                       (5)
      To produce the final ranking, only the top 𝑘𝑑𝑖𝑣𝑒𝑟𝑠𝑖𝑡 𝑦 filter clusters are retained for each main topic described in
  Section 5.1. We set 𝑘𝑑𝑖𝑣𝑒𝑟𝑠𝑖𝑡 𝑦 to 1, e.g., we retain only the top filter from each topic7 .

  5.2.4 Labeling filters. Finally we label the top 𝐾 filter clusters that will be presented for the customer. We perform
  cluster labeling by choosing the most representative member, i.e., the member closest to the cluster centroid. We utilize
  cosine similarity to determine the representative member that will act as the label of the filter cluster.


  6     EXPERIMENTS
  We conducted a number of user experiments to evaluate: (1) the top 𝑘 𝑓 𝑖𝑙𝑡𝑒𝑟𝑠 overall output and (2) the top 𝑘 𝑓 𝑖𝑙𝑡𝑒𝑟𝑠
  individual filter outputs of our proposed method, described in Section 5. Particularly, we compared the filters output
  by our proposed method against manually acquired filters. Also, we assessed the effect of uniqueness, relevance and
  diversity policies. To this end, we performed pairwise comparison against the following baseline models:

        • human: a manually compiled filter list described below in Section 6.1.
        • relevant: proposed without the uniqueness score, i.e., filter ranking is determined only by relevance.
        • unique: proposed without relevance score, i.e., filter ranking is determined only by uniqueness.
        • non-diverse: proposed without diversity policies, i.e., output is not restricted to the top 𝑘𝑑𝑖𝑣𝑒𝑟𝑠𝑖𝑡 𝑦 filters for
          each main topic.


  6.1    Manually compiled filters
  To manually acquire filters in a simulated overchoice scenario, we randomly selected 10 <original request, location>
  query tuples with the following conditions:

        • the resulting hotel hit count is at least 30 in our hotel review corpus;
        • the total number of corresponding reviews8 is at least 3000 in our hotel review corpus.

      Table 3 shows the query tuples used in this process. From the resulting reviews we randomly selected 1000 reviews
  for each query tuple. Next, using the selected reviews, we asked 3 crowd workers to extract all simple short phrases
  which in their opinion contain meaningful information in further filtering the choice set. Such phrases were manually
  grouped by each worker into clusters that share the same meaning. Finally, all clusters were aggregated by a fourth
  crowd worker, registering the number of contributing workers and the number of hotels each cluster links to. As the
  final output, we considered filters that had a number of majority contributors (at least 2 out of 3), ranked in descending
  order by the number of corresponding hotels. Table 4 shows an example of manually acquired filters.


  7 One important note is that before applying diversity policies, first we remove filters that are semantically too similar to the original user queries. We
  perform this by using cosine similarity on their sentence embeddings described in Section 4.2. Empirically we set this similarity threshold to 0.8.
  8 We performed exact text match in retrieving hotels reviews that mention an original request.


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                István Varga and Yuta Hayashibe

                          Table 3. Query tuples used with manually compiled filters. (All examples are translations from Japanese.)


                                                       Query tuples
                                                       Delicious dinner. @Okinawa
                                                       Relaxing atmosphere. @Nagano
                                                       Very helpful staff. @Gunma
                                                       Atmosphere that makes you feel at home. @Iwate
                                                       Suitable for sightseeing. @Kyoto
                                                       Near downtown. @Aichi
                                                       Close to the sea. @Shizuoka
                                                       Fashionable rooms. @Hokkaido
                                                       Child friendly. @Chiba
                                                       Hotel with good access. @Akita


                                                   Table 4. Manually selected filters for Very helpful staff. @Gunma.


                                                     Filter                        Hotel count     Worker count
                                                     Very satisfying food.              13                2
                                                     Clean rooms.                       11                3
                                                     Open-air bath available.           11                3
                                                     Large rooms.                       10                3
                                                     Suitable for families.              9                3
                                                     Delicious breakfast.                8                3
                                                     Suitable for couples.               8                2
                                                     Buffet style breakfast.             7                3
                                                     Cheap price.                        6                2


            6.2    Overall filter list evaluation
            In the first set of experiments we performed pairwise evaluation against the target models described above. We used
            the same sets of 1000 random reviews of the query tuples utilized during the manual filter acquiring process, described
            in Section 6.1. For each query, we considered the top 𝑘 𝑓 𝑖𝑙𝑡𝑒𝑟𝑠 = 5 filters from each method’s output.
               We crowdsourced the pairwise evaluation, asking 300 workers using Yahoo!Japan’s crowdsourcing service9 to choose
            the filter list they find more suitable in further narrowing down the choice set. In randomized order, we showed the
            filter lists of the two methods (named lists A and B, respectively), asking the workers to choose exactly one of four
            choices:

                  • list A is more useful than list B
                  • list B is more useful than list A
                  • list A and list B are both equally useful
                  • neither of the lists are useful

               We also asked the workers to motivate their choice for each filter list pair. After basic data quality check (i.e., removing
            workers that (1) did not provide any explanation for their choices, (2) always choose the same option, or (3) working

            9 https://crowdsourcing.yahoo.co.jp/


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
  Addressing overchoice: automatically generating meaningful filters from hotel reviews

  Table 5. Filter list evaluation: vote share difference in points for our proposal against the baseline models for each evaluation query
  tuple (statistically significant differences in boldface).


          Evaluation query tuple                                                vs human   vs non-diverse   vs unique   vs relevant
    1     Close to the sea. @Shizuoka                                             +92.38       +76.27         -7.80         +2.29
    2     Relaxing atmosphere. @Nagano                                            +13.73       +74.00        +25.22         -1.37
    3     Very helpful staff. @Gunma                                               -8.94       +67.16        +42.03        +31.65
    4     Atmosphere that makes you feel at home. @Iwate                          -18.00       +56.99        +36.73        -43.48
    5     Suitable for sightseeing. @Kyoto                                        +65.88       +14.00        +15.15        +76.53
    6     Near downtown. @Aichi                                                   +29.76       +83.18        +20.34        +69.59
    7     Delicious dinner. @Okinawa                                              -19.64       -22.08         -5.00        -18.18
    8     Fashionable rooms. @Hokkaido                                            +67.90       +43.95         -2.90        +77.40
    9     Child friendly. @Chiba                                                   -6.09        +9.83        +13.13        -35.48
    10    Hotel with good access. @Akita                                           +6.85       -16.14        -19.35        +40.64


  time was too short), we retained the results of 224 workers. Table 5 shows the overall list comparison results for each
  query tuple.
     Against the manually generated human filters, proposed was considered to be the significantly better10 overall choice
  with 5 out of the 10 evaluation query tuples. In 2 out of 10 cases the human output was considered to be significantly
  superior to the output of proposed. Overall, proposed was found to have a significant advantage over human with over
  30 points difference.
     Analysing the workers’ comments, we observed that the overall output of proposed was overwhelmingly preferred
  over human when the filters offered very specific choices or experiences (e.g., the parking lot is large, thus easy to park
  the car, delicious food with local ingredients). At the same time, human was preferred by workers who value filters which
  proposed considered as relevant, but not unique enough to rank high (e.g., large room, clean hotel). It is also worth
  mentioning that when human outperformed proposed, the number of votes counted for either of the methods was
  actually smaller than in average, both lists being equally preferred or unpreferred by a large number of workers. We can
  also note that proposed was voted as the better overall choice by an overwhelming majority with a number of query
  tuples (e.g., Close to the sea @Shizuoka). The reason for this vote difference is that proposed managed to identify filters
  that are highly specific to the initial query tuple, and at the same time are also quite appealing to potential customers
  (e.g., the splendid alphonsino was very delicious), while human failed to identify such filters with a high enough frequency.
     Against non-diverse as well, proposed exhibited a significant vote advantage (45.4 points), validating the effect of
  the diversity policies. However, non-diverse did perform better with some query tuples that are related to locations
  especially recognized or famous in relation with a specific main topic, which was captured and over-represented by the
  non-diverse method (e.g., nature topic in Akita).
     Proposed also outperformed unique and relevant by over 6.3 and 29.2 points vote count difference, respectively,
  validating that both relevance and uniqueness contribute significantly to proposed. It is worth mentioning that
  relevant behaved very similarly to human against proposed, suggesting that the workers employed in acquiring the
  manual filters may have had preference towards more relevant, rather than unique filters.


  10 We checked for significance using binomial test of significance with 𝑝 set to 0.05.


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                           István Varga and Yuta Hayashibe

            Table 6. Individual filter evaluation: vote share difference in points for our proposal against the baseline models for each evaluation
            query tuple (statistically significant differences in boldface)


                                 Evaluation query tuple                                                vs human        vs unique        vs relevant
                          1      Close to the sea. @Shizuoka                                            +80.80            -7.53             +21.51
                          2      Relaxing atmosphere. @Nagano                                           +8.37             -3.00              +3.10
                          3      Very helpful staff. @Gunma                                              +0.02           +16.74              -4.94
                          4      Atmosphere that makes you feel at home. @Iwate                          +7.47           +20.00              -3.43
                          5      Suitable for sightseeing. @Kyoto                                       +78.33           +24.46             +54.30
                          6      Near downtown. @Aichi                                                   +5.58           +16.49             +52.68
                          7      Delicious dinner. @Okinawa                                             -33.88           +15.34             -31.85
                          8      Fashonable rooms. @Hokkaido                                            +20.90           +25.88             +29.64
                          9      Child friendly. @Chiba                                                  -5.58            -3.65              -3.21
                          10     Hotel with good access. @Akita                                         +8.12            +9.46               +5.35


            6.3      Individual filter evaluation
            In the second set of experiments we performed pairwise comparison of the individual filters output by proposed against
            the outputs of the target methods11 . Here we attempt to counter the tendency some workers may have had in rejecting
            certain filter lists during list based evaluation described in Section 6.2, for the reason of containing unappealing filters.
            We used the same filters as with list based evaluation, merging and shuffling the filters into a single list. In case of
            duplicates, a single occurrence was retained. We crowdsourced the evaluation asking 300 workers using Yahoo!Japan’s
            crowdsourcing service. As opposed to filter list evaluation, here workers were asked to award individual filters, by
            selecting from 1 up to at most 5 filters they find appealing in further reducing the choice set. We also asked the workers
            to motivate their choice for each selection set.
               After basic data quality check (i.e., removing workers that (1) did not conform with the rule regarding the number of
            selected filters, or (2) working time was too short), we retained the results of 197 workers. For each query, we counted
            the total number of votes each method received. Filters that were duplicates counted for both methods. The results for
            each query tuple are summed up in Table 6.
               Against the manually acquired human method, proposed had a significantly larger vote share with 5 out of the 10
            evaluation queries, while being outperformed in only one case. We also found that proposed received more votes
            for filters that express experiences or quality judgements, positive opinions regarding a specific service or the hotel
            in general (e.g., the open-air hot spring was excellent, the free breakfast was delicious), while the human filters had the
            tendency to have more factual filters or presence/absence indicators (e.g., open-air hot spring was available, free breakfast).
            Similarly to overall filter list evaluation, human received numerous votes for filters that are highly relevant, but were
            ruled out by size constraints by proposed (e.g., clean rooms being too frequent, washing machine available being too
            rare).
               Also similarly to filter list evaluation, proposed outperformed both relevant and unique. However, it must be
            noted that both relevant and unique received numerous votes with filters that are very relevant, but less unique (e.g.,
            clean rooms, excellent service, free wifi) or unique, but arguably not relevant enough (e.g., karaoke machine is available,
            dog run attached to the hotel). These results suggest that while relevance and uniqueness both contribute to proposed,
            their importance is highly subjective.
            11 Here we skip pairwise comparison against non-diverse, since the target of this experiment are filters, as opposed to filter sets.


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
  Addressing overchoice: automatically generating meaningful filters from hotel reviews


  7    DISCUSSIONS AND FUTURE DIRECTIONS
  We found that proposed was preferred by crowd-workers in two distinct scenarios.
      Firstly, as observed during both list based and individual filter evaluations, workers preferred highly specific filters
  as opposed to more generic ones (e.g. the splendid alphonsino was very delicious versus food was delicious, the parking
  lot is large, thus easy to park the car versus parking lot available). This validates our assumption that the majority of
  potential customers are interested in very specific details in attempting to reach a decision.
      Secondly, during the individual filter evaluation, we observed the worker’s tendency in preferring filters that were
  formulated as an experience or quality judgement, rather than as a fact or presence/absence indicator, when both
  options were available. (e.g., hot-bath was great versus hot bath is available; food was delicious versus food available at
  hotel). This tendency was weak with topics in which experience itself may not be too relevant (e.g., the experience
  expressing easy to park was not overwhelmingly preferred over the factual parking lot available, arguably because the
  fact that parking is actually available is the crucial piece of information, rather than the ease of parking), but it was
  very prominent with topics such as food, location or other service related ones, where previous user’s experiences
  and reviews are more valuable than the simple availability of that specific option.
      We validated this assumption with a very simple experiment. We manually selected 30 (fact, experience) filter pairs
  and for each filter pair we asked 20 crowd-workers to choose the filter list they find more suitable in further narrowing
  down the choice set. We randomized the order of the two filters (named A and B, respectively), asking the workers to
  choose exactly one of four choices:
       • filter A is more useful than filter B
       • filter B is more useful than filter A
       • filter A and filter B are both equally useful
       • neither of the filters are useful
      After basic data quality check (i.e., (1) always choose the same option, or (2) working time was too short), we retained
  the results of 19 workers. In 28 out of 30 pairs the experience based filter received the higher share of votes, although
  both fact and experience based ones received many votes. Most of these pairs had the topic of location, food, hot
  spring or some other type of hotel service. In case of a single pair the difference was only minimally in favor of the
  experience based filter, namely parking as topic. One pair was voted as being equally helpful, having received only a
  few votes for either fact or experience based filters, in the topic of hotel amenities (amenities are available versus
  very basic amenity).
      This result suggests that the balance between fact and experience based filters is both subjective and possibly topic
  dependent. While customer reviews mainly offer intricate experience-like details, fact based filters still remain valuable.
  As current filters provided by conventional hotel reservation systems are largely fact based ones, undoubtedly intricate
  experience based filters extracted from customer reviews could add significant value. In deploying such a customer
  review based filter recommender, strategies need to be implemented to combine various types of filters from multiple
  sources of information. In the future we are planning to investigate how various sources of information can complement
  each other in providing meaningful filters.

  8    CONCLUSIONS
  In this paper we proposed a simple clustering based approach to address the overchoice problem in the hotel industry
  domain. We introduced size control and diversity policies, together with scoring verticals, in order to identify and score


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                          István Varga and Yuta Hayashibe


            filters that could reduce the search space in a natural and intuitive way. We validated our proposal through a series of
            user experiments where we also showed that the filters identified by our method were more useful than the manually
            acquired ones.


            REFERENCES
             [1] Susan Auty. 1992. Consumer choice and segmentation in the restaurant industry. Service Industries Journal 12, 3 (1992), 324–339.
             [2] Ricardo Baeza-Yates, Carlos Hurtado, and Marcelo Mendoza. 2004. Query recommendation using query logs in search engines. In International
                 conference on extending database technology. Springer, 588–596.
             [3] Lucas Bernardi, Pablo Estevez, Matias Eidis, and Eqbal Osama. 2020. Recommending Accommodation Filters with Online Learning. (2020).
             [4] Juergen Bross and Heiko Ehrig. 2013. Automatic construction of domain and aspect specific sentiment lexicons for customer review mining. In
                 Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 1077–1086.
             [5] Pablo Castells, Neil J Hurley, and Saul Vargas. 2015. Novelty and diversity in recommender systems. In Recommender systems handbook. Springer,
                 881–918.
             [6] Raghavendra Chalapathy and Sanjay Chawla. 2019. Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407 (2019).
             [7] Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM computing surveys (CSUR) 41, 3 (2009), 1–58.
             [8] Joseph Chee Chang, Nathan Hahn, Adam Perer, and Aniket Kittur. 2019. SearchLens: composing and capturing complex user interests for exploratory
                 search. In Proceedings of the 24th International Conference on Intelligent User Interfaces (Marina del Ray, California) (IUI ’19). ACM, New York, NY,
                 USA, 498–509. https://doi.org/10.1145/3301275.3302321
             [9] Li Chen, Guanliang Chen, and Feng Wang. 2015. Recommender systems based on user reviews: the state of the art. User Modeling and User-Adapted
                 Interaction 25, 2 (2015), 99–154. https://doi.org/10.1007/s11257-015-9155-5
            [10] Li Chen and Feng Wang. 2017. Explaining recommendations based on feature sentiments in product reviews. In Proceedings of the 22nd International
                 Conference on Intelligent User Interfaces (Limassol, Cyprus) (IUI ’17). Association for Computing Machinery, New York, NY, USA, 17–28. https:
                 //doi.org/10.1145/3025171.3025173
            [11] Li Chen, Feng Wang, Luole Qi, and Fengfeng Liang. 2014. Experiment on sentiment embedded comparison interface. Knowledge-Based Systems 64
                 (2014), 44–58. https://doi.org/10.1016/j.knosys.2014.03.020
            [12] Ilia Cherniavskii, Alexander Perelygin, and Russell Lee-Goldman. 2016. Suggested Keywords for Searching News-Related Content on Online Social
                 Networks. US Patent App. 14/592,988.
            [13] Jacob Cohen. 1968. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychology. Bulletin, 70, 213
                 220 (1968).
            [14] Gianna M Del Corso, Antonio Gulli, and Francesco Romani. 2005. Ranking a stream of news. In Proceedings of the 14th international conference on
                 World Wide Web. 97–106.
            [15] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language
                 understanding. arXiv preprint arXiv:1810.04805 (2018).
            [16] Kristin Diehl and Cait Poynor. 2010. Great expectations?! Assortment size, expectations, and satisfaction. Journal of Marketing Research 47, 2 (2010),
                 312–322.
            [17] Ruihai Dong and Barry Smyth. 2016. From more-like-this to better-than-this: hotel recommendations from user generated reviews. In Proceedings of
                 the 2016 Conference on User Modeling Adaptation and Personalization. 309–310.
            [18] Leila Feddoul, Sirko Schindler, and Frank Löffler. 2019. Automatic facet generation and selection over knowledge graphs. In International Conference
                 on Semantic Systems. Springer, Cham, 310–325.
            [19] Mohammad Al Hasan, Nish Parikh, Gyanit Singh, and Neel Sundaresan. 2011. Query suggestion for e-commerce sites. In Proceedings of the fourth
                 ACM international conference on Web Search and Data Mining. 765–774.
            [20] Douglas M Hawkins. 1980. Identification of outliers. Vol. 11. Springer.
            [21] Yuta Hayashibe. 2020. Japanese realistic textual entailment corpus. In Proceedings of The 12th Language Resources and Evaluation Conference.
                 6827–6834.
            [22] Marti Hearst. 2006. Design recommendations for hierarchical faceted search interfaces. In ACM SIGIR workshop on faceted search. Seattle, WA, 1–5.
            [23] Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on
                 Knowledge discovery and data mining. 168–177.
            [24] Zhipeng Huang, Bogdan Cautis, Reynold Cheng, Yudian Zheng, Nikos Mamoulis, and Jing Yan. 2018. Entity-based query recommendation for
                 long-tail queries. ACM Transactions on Knowledge Discovery from Data (TKDD) 12, 6 (2018), 1–24.
            [25] Sheena S Iyengar and Mark R Lepper. 2000. When choice is demotivating: Can one desire too much of a good thing? Journal of personality and
                 social psychology 79, 6 (2000), 995.
            [26] Sheena S Iyengar, Rachael E Wells, and Barry Schwartz. 2006. Doing better but feeling worse: Looking for the “best” job undermines satisfaction.
                 Psychological Science 17, 2 (2006), 143–150.


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
  Addressing overchoice: automatically generating meaningful filters from hotel reviews


  [27] Zhengbao Jiang, Zhicheng Dou, and Ji-Rong Wen. 2017. Generating Query Facets Using Knowledge Bases. IEEE Transactions on Knowledge and
       Data Engineering 29, 2 (2017), 315–329. https://doi.org/10.1109/TKDE.2016.2623782
  [28] Mozhgan Karimi, Dietmar Jannach, and Michael Jugovac. 2018. News recommender systems–Survey and roads ahead. Information Processing &
       Management 54, 6 (2018), 1203–1227.
  [29] Daisuke Kawahara, Yuta Hayashibe, Hajime Morita, and Sadao Kurohashi. 2017. Automatically acquired lexical knowledge improves Japanese joint
       morphological and dependency analysis. In Proceedings of the 15th International Conference on Parsing Technologies. 1–10.
  [30] Denis Kotkov, Shuaiqiang Wang, and Jari Veijalainen. 2016. A survey of serendipity in recommender systems. Knowledge-Based Systems 111 (2016),
       180–192.
  [31] Udo Kruschwitz, Deirdre Lungley, M-Dyaa Albakour, and Dawei Song. 2013. Deriving query suggestions for site search. Journal of the American
       Society for Information Science and Technology 64, 10 (2013), 1975–1994.
  [32] Taku Kudo and John Richardson. 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text
       processing. arXiv preprint arXiv:1808.06226 (2018).
  [33] Matevž Kunaver and Tomaž Požrl. 2017. Diversity in recommender systems–A survey. Knowledge-based systems 123 (2017), 154–162.
  [34] Ellen J Langer and Judith Rodin. 1976. The effects of choice and enhanced personal responsibility for the aged: A field experiment in an institutional
       setting. Journal of personality and social psychology 34, 2 (1976), 191.
  [35] Joseph Lilleberg, Yun Zhu, and Yanqing Zhang. 2015. Support vector machines and Word2vec for text classification with semantic features. In 2015
       IEEE 14th International Conference on Cognitive Informatics Cognitive Computing (ICCI*CC). 136–140. https://doi.org/10.1109/ICCI-CC.2015.7259377
  [36] Sofus A Macskassy and Foster Provost. 2001. Intelligent information triage. In Proceedings of the 24th annual international ACM SIGIR conference on
       Research and development in information retrieval. 318–326.
  [37] Noemi Mauro, Liliana Ardissono, and Maurizio Lucenteforte. 2020. Faceted search of heterogeneous geographic information for dynamic map
       projection. Information Processing & Management 57, 4 (2020), 102257. https://doi.org/10.1016/j.ipm.2020.102257
  [38] Samuel Pecar. 2018. Towards opinion summarization of customer reviews. In Proceedings of ACL 2018, Student Research Workshop. 1–8.
  [39] Raymond K Pon, Alfonso F Cárdenas, David J Buttler, and Terence J Critchlow. 2007. iScore: Measuring the interestingness of articles in a limited
       user environment. In 2007 IEEE Symposium on Computational Intelligence and Data Mining. IEEE, 354–361.
  [40] Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019).
  [41] Elena Reutskaja et al. 2009. Experiments on the role of the number of alternatives in choice. Universitat Pompeu Fabra.
  [42] Richard M Ryan and Edward L Deci. 2000. Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being.
       American psychologist 55, 1 (2000), 68.
  [43] Amani K Samha, Yuefeng Li, and Jinglan Zhang. 2014. Aspect-based opinion extraction from customer reviews. arXiv preprint arXiv:1404.1982
       (2014).
  [44] Benjamin Scheibehenne, Rainer Greifeneder, and Peter M Todd. 2010. Can there ever be too many options? A meta-analytic review of choice
       overload. Journal of consumer research 37, 3 (2010), 409–425.
  [45] Barry Schwartz. 2004. The paradox of choice: Why more is less. Ecco New York.
  [46] Avni M. Shah and George Wolford. 2007. Buying behavior as a function of parametric variation of number of choices. Psychological Science
       -Cambridge- 18, 5 (2007), 369.
  [47] Koji Takuma, Junya Yamamoto, Sayaka Kamei, and Satoshi Fujita. 2016. A hotel recommendation system based on reviews: What do you attach
       importance to?. In 2016 Fourth International Symposium on Computing and Networking (CANDAR). IEEE, 710–712.
  [48] Alvin Toffler. 1970. Future shock, 1970. Sydney. Pan (1970).
  [49] Arseny Tolmachev, Daisuke Kawahara, and Sadao Kurohashi. 2018. Juman++: A morphological analysis toolkit for scriptio continua. In Proceedings
       of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 54–59.
  [50] Yaxuan Wang, Hanqing Lu, Yunwen Xu, Rahul Goutam, Yiwei Song, and Bing Yin. 2021. QUEEN: Neural Query Rewriting in E-commerce. (2021).
  [51] Joe H Ward Jr. 1963. Hierarchical grouping to optimize an objective function. Journal of the American statistical association 58, 301 (1963), 236–244.
  [52] Rong Xiao, Jianhui Ji, Baoliang Cui, Haihong Tang, Wenwu Ou, Yanghua Xiao, Jiwei Tan, and Xuan Ju. 2019. Weakly supervised co-training of
       query rewriting andsemantic matching for e-commerce. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining.
       402–410.
  [53] Kunpeng Zhang, Ramanathan Narayanan, and Alok N Choudhary. 2010. Voice of the Customers: Mining Online Customer Reviews for Product
       Feature-based Ranking. WOSN 10 (2010), 11–11.
  [54] Jingbo Zhu, Huizhen Wang, Muhua Zhu, Benjamin K Tsou, and Matthew Ma. 2011. Aspect-based opinion polling from customer reviews. IEEE
       Transactions on affective computing 2, 1 (2011), 37–49.


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

</pre>