Learning to Measure Quality of Queries for Automatic Query Suggestion Xian Chen Hyoseop Shin Dept. of Internet & Multimedia Eng. Dept. of Internet & Multimedia Eng. Konkuk University, Seoul Korea Konkuk University, Seoul Korea +82-2-450-4071 +82-2-2049-6117 chenxian@konkuk.ac.kr hsshin@konkuk.ac.kr ABSTRACT search systems in Korea. Furthermore, we present a learning Users tend to use their own terms to search items in structured model to select alternative candidate queries against a user search systems such as restaurant searches (e.g. Yelp), but due query. to users’ lack of understanding on internal vocabulary and structures, they often fail to adequately search, which leads to 2. Query Suggestion unsatisfying search results. In this case, search systems should For restaurant search, the most common types of search term are assist users to use different terms for better search results. To location and food. Figure 1 shows our query suggestion model address this issue, we develop a scheme to generate suggested how to deal with both. According to different inputs, the model queries, given a user query. At first in doing this, in this paper suggests different options. First, the model checks the types of we propose a scheme to evaluate queries (i.e. user queries and/or search terms: Original query is: 1) single food; 2) single suggested queries) based on two measures: 1) if the query will location; 3) food with location. For case 1) and 2), the model has return a sufficient number of search results, 2) if the query will four choices: relax to bigger scope terms (R), change to similar return search results of high quality. Furthermore, we present a terms (C), tighten (T) to smaller scope ones and do nothing (N); learning model to choose among alternative candidate queries for case 3), since there are four types of actions: N, R, C and T against a user query. Our experiments show the proposed and two types of terms, thus the model have options to method is feasible and scalable. suggest, for instance, RC means relaxing food with changing location. This scheme is scalable for more types of terms, e.g., n Categories and Subject Descriptors types, options to suggest. H.3.3 [Information Search and Retrieval]: Information Filtering General Terms Algorithms. Keywords Quality of queries, query suggestion, learning to measure 1. INTRODUCTION Figure 1. Query Suggestion Processing The generic Internet search does not confront empty search results, such as web search, which always return millions of web In previous work, knowledge-based. recommender systems [1] pages (e.g. Google). However, due to users’ lack of proposed to relax query by removing users’ constraints; Query understanding on internal vocabulary and structures, if they use recommender systems [2] provided to build hierarchy structures their own terms to search items in controlled vocabulary for each attribute, in order to relax users’ conflictive structure-based search systems, such as restaurant search on requirements and avoid empty result. However, we build Yelp, they may obtain empty, too many or unsatisfying results. hierarchical structures of location and food, not only for In this case, the system should suggest better query terms to relaxing, but also for tightening and changing. For location, we users. Although generic search engines (e.g. Google) may also build hierarchy structures in both vertical and horizontal level. have query suggestion schemes based on most users’ co-click Vertical level hierarchy is used to relax and tighten while frequency, they may not fully utilize controlled and structured horizontal level is used to change with neighbor areas. domain vocabulary in specific areas (e.g. restaurant search), in Meanwhile, we get food hierarchy structure by applying which users will expect search results of a higher level than hierarchical clustering for relaxing and tightening. To support generic search systems provide. To assist users use better search changing similar food, we calculate food similarity by building terms, we develop a scheme to suggest alternative queries, given restaurant-food matrix, applying collaborative filtering, a user’s query. For doing this, we evaluate the quality of user’s non-negative matrix factorization and latent semantic analysis, query and/or suggested queries. Hence, we propose two and we empirically select the best results from three of them. measures: 1) whether the query will return a sufficient number of search results, 2) and/or whether the query will return search 3. Query Measures results of high quality. Based on the two measures, we build In controlled vocabulary structure-based searches, there are few query suggestion and measurement models in one of restaurant researches about query measurement. Here, We evaluate queries based on two measures: Copyright is held by the author/owner(s). RecSys 2014 Poster Proceedings, October 6–10, 2014, Foster City,  Quantity: The query should return a sufficient number of Silicon Valley, USA. results. For controlled vocabulary structure-based search, the number of returned results can be neither empty nor too Learning algorithms with more criterions help us deal with the much. Since the number of restaurants diversely distribute in complicated cases. This ambiguous distribution also encourages the whole Korea, we can set neither a fixed number nor an us to analyze criterions more detail in the future. interval to measure the quantity of results. Table 1. Accuracies for machine learning models  Quality: The query should return search results with high Algorithm Precision Recall F-score quality. To define the high quality restaurant (HQ-rest), we collected blogs from NAVER blogosphere in Korea Naïve Bayes 58.5% 97.6% 73.1% (http://blog.naver.com/) and ranked restaurants according to SVM 71.4% 47.1% 56.7% power bloggers’ contribution [3] through social network. However, we cannot use the same criterion to measure Decision Tree 82.5% 94.1% 87.9% HQ-rest in different locations, since both the restaurant Neural Network 79.3% 76.5% 77.8% distribution and the HQ-rest distribution are unbalanced in Random Tree 86.9% 91.3% 89% the country. (E.g. the high ranked restaurants in capital city (i.e., Seoul) are totally different with those in a small town.) Hence, we define HQ-rest based on different level of areas, such as province-level, city-level, town-level, district-level, valley-level and even street-level. The formula is as follows: ∑ ∑ (1) We sort R restaurants in one area by ranked scores, we keep on adding from the highest score until the accumulation score is more than some ratio of total sum score, such as , a constant, ranges between (0,1). These top K is selected as this level HQ-rest for the area. In our experiment, we set = 0.5. 4. Learning to Measure Neither quantity nor quality can be measured by simple Figure 2. The relationship among three criterions threshold or an interval. Therefore, we build a learning model to classify query into good or bad, with six criterions in terms of 5. Conclusion quantity and quality. The criterions are as follows: In this paper, to assist users to search better results in restaurant Quantity: searches, we propose a scheme to suggest alternative queries and present a learning model to measure queries in terms of quantity  The number of returned restaurants; and quality. Query suggestion can be extended for adding more  The number of matched restaurants among top 10 in types of search terms. The query measurement experiments returned list: matched restaurant is defined as the restaurant, showed that our method is feasible and scalable. Our future which satisfies user’s requirements. (E.g., user inputs work includes: applying probabilistic model to rest-food matrix “chicken” and try to find fried chicken restaurant, but the (e.g. PLSA, LDA), accumulating more training set and applying returned results include other types of chicken, e.g. chicken heuristic, ranking methods to query measurement. soup. In this case, we count the number of fried chicken restaurants as the number of matched restaurants.) 6. ACKNOWLEDGMENTS This research is supported by Mid-career Researcher Program Quality: through NRF grant funded by the MEST (2011- 0029729).  The ranked position of first matched restaurant; 7. REFERENCES  The maximum score of returned restaurants; [1] Burke, R., Knowledge-based Recommender Systems.  The number of HQ-rest against area level among returned Encyclopedia of Library and Information Systems (A. Kent, restaurants; ed.). 69, 32 (2000). Others: [2] Mirzadeh, N., Ricci, F. and Bansal, M, support user query relaxation in a recommender system. Lecture Notes in  The location level. Computer Science. 3182 (2004), 31-40. We make training data by evaluating 302 queries with human [3] Shin, H. and Lee, J., Impact and Degree of User Sociability effects. 76 are evaluated as good, the other are as bad. We build in Social Media. Information Sciences. 196, 1 (Aug, 2012), learning model for several algorithms. The accuracies of 23-46. 10-folds cross validation, precision, recall and F-score are shown in Table 1. The average accuracy is more than 77%. We draw the distribution of first matched position with number of HQ-rest for each location level in Figure 2. It indicates: #HQ-rest in high-level location is more than low-level location; the first matched position ranges [1,3] for most of good (red) queries, but we cannot use only this criterion to evaluate query, since between [1,3], first matched positions of bad (blue) queries distribute everywhere. That is why we design several criterions.