An Advice Recommender System Based on Complaint Data Analysis Liang Yang Daisuke Kitayama Kazutoshi Sumiya Kwansei Gakuin University, Japan Kogakuin University, Japan Kwansei Gakuin University, Japan dui93794@kwansei.ac.jp kitayama@cc.kogakuin.ac.jp sumiya@kwansei.ac.jp ABSTRACT Nowadays, there are a large number of users who post complaints about a certain service on the Internet. Because users have vari- ous values and views, even if they receive the same service, they may complain in different ways. However, it is quite difficult to respond to various user demands for service in real time and there are almost no direct solutions when users feel dissatisfied with a certain service. Therefore, in this paper, we propose an advice rec- ommender system by analyzing complaint data from Fuman Kaitori Center. First, the system generates query keywords according to various user complaints about a certain service by calculating the score of each query. Then suitable web pages containing advice are recommended from the results of the query. This advice could Figure 1: Example of Advice Recommendation address users’ dissatisfaction and respond to their various demands in a comprehensive way. Also, we verify the usability of proposed system by using a questionnaire survey evaluation. 2 RELATED WORK CCS CONCEPTS 2.1 FKC Dataset • Information System → Information Retrieval; • Query Pro- The FKC dataset has been used for several studies in recent years. cessing → Query Suggestion. Mitsuzawa et al. [1] presented the FKC dataset which is from Fuman Kaitori Center (FKC). "Fuman" means dissatisfaction in Japanese. The FKC is a Japanese consumers’ negative opinion data collection KEYWORDS and analysis service. In our work, we used and analyzed the FKC Recommender System, Query Extraction, Advice, Complaint Data dataset. Hasegawa et al. [2] analyzed and visualized the contents of the FKC dataset such as the distribution of users’ ages, jobs, and gender. 1 INTRODUCTION In our work, we determined the target of the experiment based on In recent years, many users post negative reviews about a certain their results. service online. However, it is quite difficult to respond to various user demands for service in real time as the service is provided by 2.2 Topic Word Extraction the company. In addition, there are almost no direct solutions when Sakai et al. [5] proposed a method to extract negative words as the users feel dissatisfied with a certain service. Therefore, this paper expressions of dissatisfaction from blogs. They extracted nouns, ad- is focused on user complaints related to services and proposed a jectives to make a dissatisfaction expression dictionary. In our work, system to search for advice that could address users’ dissatisfaction we only extract nouns because nouns can explain and represent by generating query keywords from complaint reviews[11]. This the content of users’ complaints. advice contains merits of the service users may not be aware of and Hashimoto et al. [6] proposed a method to extract important could respond to their different demands in a comprehensive way. topics from newspaper and detect social problems based on doc- An example of advice recommendation is described in Figure1. ument clustering. Ustumi et al. [4] proposed a method to extract The remainder of this paper is structured as follows. Section technological solutions to social problems such as medical issues 2 presents a brief summary of related work. Section 3 introduces from the news. They extracted technological solution words by the dataset we use for research and explains the proposed system. calculating the relevance of problems and technologies. They de- Section 4 discusses the experimental results and the evaluation of fined the relevance calculation as problem relevancy and technical the proposed system. Finally, Section 6 concludes this paper and relevancy. A higher value of relevancy indicated a higher possibility discusses future work. of being able to extract a technological solution word. In our work, we use this concept and extract the advice topic word by calculating the relevance of the company and complaint topic. However, we ComplexRec 2019, 20 September 2019, Copenhagen, Denmark Copyright ©2019 for this paper by its authors. Use permitted under Creative Commons hypothesize that a lower relevancy indicates a higher probability License Attribution 4.0 International (CC BY 4.0).. that a word is an advice topic word. ComplexRec 2019, 20 September 2019, Copenhagen, Denmark Yang, et al. Figure 2: System Overview Yoshida et al. [7] proposed a method to extract features terms 3.2 Dataset from the customer reviews of e-commerce sites in order to recom- In this study, we analyze a dataset of complaints from the Fuman mend similar items to users. They used polarity analysis to calculate Kaitori Center, which is provided by Insight Tech Inc. from the the degree of importance of feature words by counting the number National Institute of Informatics. In this paper, we refer to the Fu- of positive reviews, negative reviews, and positive ratings. In our man Kaitori Center’s dataset as the FKC dataset. The Fuman Kaitori work, we also use polarity analysis to evaluate advice topic words. Center is a website on which users can post their complaints about In addition, we weight words according to the result of polarity topics such as products, services, education, work, and relation- analysis. ships. Moreover, users get points when they post complaints that they can exchange for coupons for online shopping websites. This 2.3 Query Generation dataset contains about 5 million negative reviews that were posted from 18 March 2015 to 12 March 2017 by around 100,000 users. Song et al. [9] and Kajinami et al. [10] proposed a system to gen- Each negative review contains the information shown in Table 1. erate query keywords that can support a user’s search intention. In FKC dataset, each category contains several subcategories, and Kakimoto et al. [8] proposed a system to extract query keywords each subcategory contains several companies.In this paper, because from the closed caption data of TV programs to recommend web we focus on user service complaints, our proposed system uses the pages related to tourism and events based on users’ preferences. data fields for “company" and “text" . In our work, we extract query keywords from negative reviews to recommend web pages of advice with the aim of addressing a user’s Table 1: Data Structure of the FKC dataset dissatisfaction with a certain service. Data Item Content 2.4 Recommender System using Complaint Data post_id complaint ID user_id Fuman Kaitori Center ID Hayashi et al. [3] proposed a system to recommend appropriate category complaint category products for users according to their complaints. This system could subcategory detailed complaint category directly resolve users’ dissatisfaction by recommending certain sub- company company name stitute product. In our work, we proposed a system to resolve user product product name complaints about services instead of products in an indirect way. text negative review 3 PROPOSED SYSTEM 3.1 System Overview 3.3 Extraction of Company Names and In this paper, we propose a system by analyzing complaint data Complaint Topic Words from Fuman Kaitori Center for recommendation of advice in order Our proposed system extracts company names from FKC dataset to address users’ dissatisfaction about a certain service. Figure2 directly from the company field of each record. Next, we extract shows the system flow of our proposed method. First, we extract the complaint topic word by analyzing negative reviews. In this the company name and complaint topic words by calculating the paper, we only use the negative reviews that are labeled with the importance of the nouns in the negative reviews. Second, we obtain company name. To extract complaint topic words, we first extract candidate search keywords of these extracted words. Then, the all companies’ negative reviews for one subcategory and extract all system extracts the advice topic word by calculating the relevancy nouns from the negative reviews. Next, we calculate the importance of the candidate keywords to the FKC dataset and score them using of each noun using the following equation. morphological and polarity analyses. Third, we create the query by combining the company name, complaint topic word, and advice tf tf ×Í (1) topic word according to their various complaints about a certain |A| d ∈D t fd service. Finally, suitable web pages containing advice that could Here, t f is defined as the number of occurrences of a particular address a user’s complaint are recommended from the results of noun in the complaints for a certain company, |A|is defined as the the query. number of all nouns in the complaints for a certain company, and An Advice Recommender System Based on Complaint Data Analysis ComplexRec 2019, 20 September 2019, Copenhagen, Denmark d ∈D t fd is defined as the number of occurrences of certain noun 3.5 Generation of Web Search Queries Í for the complaints for all companies. Finally, we extract all nouns In this study, each company name and complaint topic word are whose importance values are above the determined threshold value matched with several advice topic words. To search for suitable and define them as that company’s complaint topic words. websites, We use an OR-based search method to acquire advice websites. Our proposed system generates the query based on one 3.4 Extraction of Advice Topic Words company name, one complaint topic word, and one advice topic To extract the advice topic word, we first obtain candidate search word. keywords of the company name and complaint topic word. Because the FKC dataset is full of negative reviews, we hypothesize that 3.6 Recommendation of Advice candidate keyword that are less relevant to the FKC dataset will Our proposed system recommends suitable web pages containing make better advice topic words. To verify this hypothesis, we calcu- advice from the results of the query which is based on users’ com- lated the relevance of these candidate keywords for each company plaints. Figure3 shows the user interface of our proposed system. and each complaint topic word and define as “company relevancy” First, the system generates several queries by analyzing user’s neg- and “complaint topic relevancy”. It is calculated using the following ative review. Next, user can choose and browse the web page based equation. on their needs by a web search using the offered queries. The sys- tem recommends the advice information that could address user’s R dissatisfaction expressed in the negative review. company relevancy = cd (2) Rc R complaint topic relevancy = td (3) Rt Here, Rcd is defined as the number of occurrences of certain can- didate keyword in complaints for the company in the FKC dataset and Rc is defined as the number of negative reviews of that com- pany. R td is defined as the number of occurrences of the candidate keyword with the complaint topic word in the negative reviews Figure 3: User Interface of the FKC dataset and R t is defined as the number of negative reviews with that complaint topic word. After that, to exclude some negative words as well as verbs 4 EXPERIMENT AND EVALUATION and adjectives which do not help users acquire advice, we weight candidate keywords using morphological and polarity analyses, as 4.1 Experiment shown in Table 2. In this study, we conducted an experiment to extract the complaint and advice topic words in order to verify the feasibility of proposed Table 2: Weight for Candidate Keywords system. For this experiment, we analyzed the subcategory of “IT web services" of the FKC dataset, which is under the category “industry.” We analyzed 1,000 negative reviews for each of three Result of Analysis Weight companies. negative 0.8 First, we extracted the complaint topic words and determined verb 0.7 different threshold values for each of the three companies. For com- adjective 0.7 pany A, we extracted 186 complaint topic words above the threshold proper noun (place name) 0.7 value of 0.00080. For company B, we extracted 144 complaint topic proper noun (organization name) 0.3 words above the threshold value of 0.00076. For company C, we ex- common noun 0.3 tracted 86 complaint topic words above the threshold value 0.00080. verbal noun 0.1 Table 3 shows examples of the complaint topic words for each com- pany. These examples show that each complaint topic word implies the object of different users’ dissatisfaction. Finally, we calculate the final score of the candidate keywords Next,we extracted advice topic words from the candidate key- by combining the arithmetic mean of the company and complaint words that had a score less than 0.0043, 0.0020, and 0.0033 for topic relevancies with the weight as the following equation. companies A, B, and C, respectively. Some examples of these words are shown in Table 4. As TABLE 4 shows, the proposed method is relevancies sufficient for ranking candidate keywords. Score = × W eiдht (4) 2 After calculating the final score of each candidate keyword, we 4.2 Evaluation determine the threshold value for each company. Candidate key- In this paper, we conducted a questionnaire-based survey to eval- words those scores are under the threshold value become the advice uate the usability and effectiveness of the proposed method.The topic words. questionnaire-based survey contained following 3 questions. For ComplexRec 2019, 20 September 2019, Copenhagen, Denmark Yang, et al. Table 3: Example Advice Topic Words negative review. It not only shows that the proposed method is effective, but also explains the method to rank nouns by calculating Company Complaint Topic Words the importance performed well. purchase, prime, delivery, review,delivery fee, Table 6: Result of Q2 A order, membership, return, post, gift, cardboard box, price, sign, yamato, book stamp, code, block, coin, group, p@1 p@3 p@5 B backup, lock, telephone call, camera, setting, Average of p@k 0.70 0.67 0.60 post, input, message, content, commercial Average of r@k 0.18 0.50 0.75 news, question, premium,answer, auction, F-measure 0.28 0.57 0.67 C article, title, navigation, mail, shopping, ID, weather forecast, transaction, search, comment The result of Q2 showed that if we search advice by using the query which is with the lowest score for one time only, 70% of Table 4: Example Advice Topic Words appropriate queries can be offered to make web search in order to address the complaints expresses in the negative reviews. This Complaint result demonstrated that scoring candidate keyword is effective. Company Advice Topic Words However, we found out that the longer the candidate keyword was, Topic Words the lower the score will be when making the candidate keywords . charge, present, In the future, we plan to develop a method to ensure if the candidate A point how to save up, keyword is related to the complaint topic word to better exclude how to use, credit card noise in the results. security, group friend, For Q3, the result showed that 75% of the answer felt satisfied B setting privacy, initialization, with the contents of advice they searched with the queries they’ve recommendation chosen. From this result, it is observed that by using the proposed privilege, merit system could address users’ dissatisfaction and the recommendation C premium cancellation of agreement, of advice respond to the demands of different users. Moreover, it magazine implied by using proposed system can help users to release their burden when searching for advice comparing to traditional search engine. Q1, we extracted all nouns from the negative review for respondents to choose from. For Q2, we provided 15 queries for each negative 5 CONCLUSION review to choose from. 5 of the queries’ candidate keywords were In this paper, we proposed a recommender system by analyzing made by ourselves. For Q3, we evaluated the satisfaction of the complaint data to recommend suitable advice. We extracted query result of advice recommendation. keywords from various user complaints about a certain service Q1:Please choose one word which you think could represent the by calculating the score of each query. Then suitable web pages dissatisfaction of the following reviews. containing advice are recommended from the results of the query. Q2:please choose the query that you think the contents returned In addition, we evaluated the effectiveness and usability of the by a search using this query keyword could address the complaints proposed system through a questionnaire survey, and the results found in the negative review.(multiple choices are allowed) shows that the generated query keywords would be useful for col- Q3:Please make a web search with the queries you have chosen. Do lecting advice. In addition, the recommendation of advice returned you feel satisfied with the contents of the advice? by query keywords could address users’ dissatisfaction with a ser- 4.3 Result and Discussion vice and respond to different user demands in a comprehensive way. We collected the answers of 10 respondents, and the results are In the future, we plan to evaluate the satisfaction of each query shown in Table 5 and Table 6. We defined those nouns and queries and analyze the result. Furthermore, we will consider new meth- were chosen by over 5 answers as true positive. ods to obtain candidate query keywords which users are hard to associate to enhance the usability of the proposed system. Table 5: Result of Q1 ACKNOWLEDGMENTS p@1 In this paper, we used FKC Data Set provided for research purposes Average of p@k 0.60 by National Institute of Informatics in cooperation with Insight Tech Inc. The result of Q1 showed that if we search by using those nouns REFERENCES are with the highest value of importance for one time only, 60% [1] Kensuke Mitsuzawa, Maito Tauchi, Mathieu Domoulin, Masanori Nakashima and of appropriate complaint topic word can be extracted from the Tomoya Mizumoto. “ FKC Corpus: a Japanese Corpus from New Opinion Survey An Advice Recommender System Based on Complaint Data Analysis ComplexRec 2019, 20 September 2019, Copenhagen, Denmark Service,”In proceedings of the Novel Incentives for Collecting Data and Annotation [7] Tomoshi Yoshida and Daisuke Kitayama. “An Evaluation of Feature Term Extrac- from People: types, implementation, tasking requirements, workflow and results, tion Method based on Polarity Analysis from Customer Reviews,” IEICE-DE2016-5 Portoro, Slovenia pp.11-18, May. 2016 vol. 116, no. 105, pp. 19–24, Jun. 2016. [2] Tooru Hasegawa and Daisuke Kitayama. “The Visualization of Dissatisfaction [8] Honoka Kakimoto, Toshinori Hayashi, Yuanyuan Wang, Yukiko Kawai, and Groups using Dissatisfaction Dataset,” DEIM Forum 2017, P7-1. (In Japanese) Kazutoshi Sumiya. “Query Keyword Extraction from Video Caption Data based [3] Toshinori Hayashi, Yuanyuan Wang, Yukiko Kawai, and Kazutoshi Sumiya. “An on Spatio-Temporal Features ,” Lecture Notes in Engineering and Computer Sci- E-Commerce Recommender System using Complaint Data and Review Data,” ence: Proceedings of the International MultiConference of Engineers and Computer Proc. of ACM IUI2018 Workshop on Web Intelligence and Interaction (WII 2018). Scientists 2018, pp. 405–408, Mar.2018 [4] Kazuo Utsumi, Takashi Inui, Taiichi Hashimoto, Koji Murakami and Masamichi [9] Ximei Song and Masao Takaku. “Study on Navigation support System based on Ishikawa. “Extraction of Critical Knowledge concerning Social Problems and their User’s Search Intents,” ARG WI2, no. 9, 2016. Technological Solutions,” Socio Technology Research Journal, vol. 6, pp. 187–198, [10] Tomoki Kajinami, Toshiyuki Ogasawara, Jhoji Komiya and Yasufumi Takama. Mar. 2009. “Application of Keyword Map to Decision Support through Exploratory Search, [5] T. Sakai and Ko Fujimura. “Discovering Latent Solutions from Expressions of ”2008 IEEE International Conference on Systems, Man and Cybernetics, 2177-2181, Dissatisfaction in Blogs,” Information Processing Society of Japan, vol. 52, no. 12, 2008 pp. 3806–3816, Dec. 2011. [11] Liang Yang, Daisuke Kitayama, and Kazutoshi Sumiya.“Query Keyword Extrac- [6] Taiichi Hashimoto, Koji Murakami, Takashi Inui, Kazuo Utsumi and Masamichi tion from Complaint Data for Collecting Advice,” Lecture Notes in Engineering and Ishikawa. “Topic Extraction and Social Problem Detection based on Document Computer Science: Proceedings of the International MultiConference of Engineers Clustering,” Socio Technology Research Journal, vol. 5, pp. 216-226, Mar. 2008. and Computer Scientists 2019, pp. 347–351, Mar.2019