Learning to Measure Quality of Queries for Automatic
                      Query Suggestion
                           Xian Chen                                                     Hyoseop Shin
              Dept. of Internet & Multimedia Eng.                             Dept. of Internet & Multimedia Eng.
               Konkuk University, Seoul Korea                                  Konkuk University, Seoul Korea
                        +82-2-450-4071                                                  +82-2-2049-6117
                 chenxian@konkuk.ac.kr                                             hsshin@konkuk.ac.kr

ABSTRACT                                                             search systems in Korea. Furthermore, we present a learning
Users tend to use their own terms to search items in structured      model to select alternative candidate queries against a user
search systems such as restaurant searches (e.g. Yelp), but due      query.
to users’ lack of understanding on internal vocabulary and
structures, they often fail to adequately search, which leads to
                                                                     2. Query Suggestion
unsatisfying search results. In this case, search systems should     For restaurant search, the most common types of search term are
assist users to use different terms for better search results. To    location and food. Figure 1 shows our query suggestion model
address this issue, we develop a scheme to generate suggested        how to deal with both. According to different inputs, the model
queries, given a user query. At first in doing this, in this paper   suggests different options. First, the model checks the types of
we propose a scheme to evaluate queries (i.e. user queries and/or    search terms: Original query is: 1) single food; 2) single
suggested queries) based on two measures: 1) if the query will       location; 3) food with location. For case 1) and 2), the model has
return a sufficient number of search results, 2) if the query will   four choices: relax to bigger scope terms (R), change to similar
return search results of high quality. Furthermore, we present a     terms (C), tighten (T) to smaller scope ones and do nothing (N);
learning model to choose among alternative candidate queries         for case 3), since there are four types of actions: N, R, C and T
against a user query. Our experiments show the proposed              and two types of terms, thus the model have             options to
method is feasible and scalable.                                     suggest, for instance, RC means relaxing food with changing
                                                                     location. This scheme is scalable for more types of terms, e.g., n
Categories and Subject Descriptors                                   types,      options to suggest.
H.3.3 [Information Search and Retrieval]: Information
Filtering

General Terms
Algorithms.

Keywords
Quality of queries, query suggestion, learning to measure

1. INTRODUCTION                                                                  Figure 1. Query Suggestion Processing
The generic Internet search does not confront empty search
results, such as web search, which always return millions of web     In previous work, knowledge-based.     recommender systems [1]
pages (e.g. Google). However, due to users’ lack of                  proposed to relax query by removing users’ constraints; Query
understanding on internal vocabulary and structures, if they use     recommender systems [2] provided to build hierarchy structures
their own terms to search items in controlled vocabulary             for each attribute, in order to relax users’ conflictive
structure-based search systems, such as restaurant search on         requirements and avoid empty result. However, we build
Yelp, they may obtain empty, too many or unsatisfying results.       hierarchical structures of location and food, not only for
In this case, the system should suggest better query terms to        relaxing, but also for tightening and changing. For location, we
users. Although generic search engines (e.g. Google) may also        build hierarchy structures in both vertical and horizontal level.
have query suggestion schemes based on most users’ co-click          Vertical level hierarchy is used to relax and tighten while
frequency, they may not fully utilize controlled and structured      horizontal level is used to change with neighbor areas.
domain vocabulary in specific areas (e.g. restaurant search), in     Meanwhile, we get food hierarchy structure by applying
which users will expect search results of a higher level than        hierarchical clustering for relaxing and tightening. To support
generic search systems provide. To assist users use better search    changing similar food, we calculate food similarity by building
terms, we develop a scheme to suggest alternative queries, given     restaurant-food matrix, applying collaborative filtering,
a user’s query. For doing this, we evaluate the quality of user’s    non-negative matrix factorization and latent semantic analysis,
query and/or suggested queries. Hence, we propose two                and we empirically select the best results from three of them.
measures: 1) whether the query will return a sufficient number
of search results, 2) and/or whether the query will return search    3. Query Measures
results of high quality. Based on the two measures, we build         In controlled vocabulary structure-based searches, there are few
query suggestion and measurement models in one of restaurant         researches about query measurement. Here, We evaluate queries
                                                                     based on two measures:
Copyright is held by the author/owner(s).
RecSys 2014 Poster Proceedings, October 6–10, 2014, Foster City,        Quantity: The query should return a sufficient number of
Silicon Valley, USA.                                                     results. For controlled vocabulary structure-based search, the
    number of returned results can be neither empty nor too            Learning algorithms with more criterions help us deal with the
    much. Since the number of restaurants diversely distribute in      complicated cases. This ambiguous distribution also encourages
    the whole Korea, we can set neither a fixed number nor an          us to analyze criterions more detail in the future.
    interval to measure the quantity of results.                               Table 1. Accuracies for machine learning models
   Quality: The query should return search results with high
                                                                             Algorithm         Precision       Recall        F-score
    quality. To define the high quality restaurant (HQ-rest), we
    collected blogs from NAVER blogosphere in Korea                         Naïve Bayes          58.5%         97.6%          73.1%
    (http://blog.naver.com/) and ranked restaurants according to                SVM              71.4%         47.1%          56.7%
    power bloggers’ contribution [3] through social network.
    However, we cannot use the same criterion to measure                    Decision Tree        82.5%         94.1%          87.9%
    HQ-rest in different locations, since both the restaurant             Neural Network         79.3%         76.5%          77.8%
    distribution and the HQ-rest distribution are unbalanced in
                                                                            Random Tree          86.9%         91.3%           89%
    the country. (E.g. the high ranked restaurants in capital city
    (i.e., Seoul) are totally different with those in a small town.)
    Hence, we define HQ-rest based on different level of areas,
    such as province-level, city-level, town-level, district-level,
    valley-level and even street-level. The formula is as follows:
            ∑                 ∑                    (1)
    We sort R restaurants in one area by ranked scores, we keep
    on adding from the highest score until the accumulation
    score is more than some ratio of total sum score, such as ,
    a constant, ranges between (0,1). These top K is selected as
    this level HQ-rest for the area. In our experiment, we set
    = 0.5.

4. Learning to Measure
Neither quantity nor quality can be measured by simple                      Figure 2. The relationship among three criterions
threshold or an interval. Therefore, we build a learning model to
classify query into good or bad, with six criterions in terms of       5. Conclusion
quantity and quality. The criterions are as follows:                   In this paper, to assist users to search better results in restaurant
Quantity:                                                              searches, we propose a scheme to suggest alternative queries and
                                                                       present a learning model to measure queries in terms of quantity
   The number of returned restaurants;                                and quality. Query suggestion can be extended for adding more
   The number of matched restaurants among top 10 in                  types of search terms. The query measurement experiments
    returned list: matched restaurant is defined as the restaurant,    showed that our method is feasible and scalable. Our future
    which satisfies user’s requirements. (E.g., user inputs            work includes: applying probabilistic model to rest-food matrix
    “chicken” and try to find fried chicken restaurant, but the        (e.g. PLSA, LDA), accumulating more training set and applying
    returned results include other types of chicken, e.g. chicken      heuristic, ranking methods to query measurement.
    soup. In this case, we count the number of fried chicken
    restaurants as the number of matched restaurants.)                 6. ACKNOWLEDGMENTS
                                                                       This research is supported by Mid-career Researcher Program
Quality:                                                               through NRF grant funded by the MEST (2011- 0029729).
   The ranked position of first matched restaurant;
                                                                       7. REFERENCES
   The maximum score of returned restaurants;
                                                                       [1] Burke, R., Knowledge-based Recommender Systems.
   The number of HQ-rest against area level among returned                Encyclopedia of Library and Information Systems (A. Kent,
    restaurants;                                                           ed.). 69, 32 (2000).
Others:                                                                [2] Mirzadeh, N., Ricci, F. and Bansal, M, support user query
                                                                           relaxation in a recommender system. Lecture Notes in
   The location level.
                                                                           Computer Science. 3182 (2004), 31-40.
We make training data by evaluating 302 queries with human
                                                                       [3] Shin, H. and Lee, J., Impact and Degree of User Sociability
effects. 76 are evaluated as good, the other are as bad. We build
                                                                           in Social Media. Information Sciences. 196, 1 (Aug, 2012),
learning model for several algorithms. The accuracies of
                                                                           23-46.
10-folds cross validation, precision, recall and F-score are
shown in Table 1. The average accuracy is more than 77%.
We draw the distribution of first matched position with number
of HQ-rest for each location level in Figure 2. It indicates:
#HQ-rest in high-level location is more than low-level location;
the first matched position ranges [1,3] for most of good (red)
queries, but we cannot use only this criterion to evaluate query,
since between [1,3], first matched positions of bad (blue) queries
distribute everywhere. That is why we design several criterions.