Proposal and Evaluation of Serendipitous Recommendation Method Using General Unexpectedness Takayuki Akiyama Kiyohiro Obara Masaaki Tanizaki Hitachi, Ltd., Central Research Hitachi, Ltd., Central Research Hitachi, Ltd., Central Research Laboratory Laboratory Laboratory 1-280, Higashi-Koigakubo, 1-280, Higashi-Koigakubo 1-280, Higashi-Koigakubo Kokubunji-shi, Tokyo Kokubunji-shi, Tokyo Kokubunji-shi, Tokyo 185-8601 Japan 185-8601 Japan 185-8601 Japan Tel: +81-42-323-1111 ext. 4302 Tel: +81-42-323-1111 ext. 3612 Tel: +81-42-323-1111 ext. 4068 takayuki.akiyama.hv kiyohiro.obara.pc masaaki.tanizaki.tj @hitachi.com @hitachi.com @hitachi.com ABSTRACT including many useless items. Recommender systems support users in selecting items and To solve this problem, so-called “recommender systems”—for services in an information-rich environment. Although recommending suitable items to users by monitoring a user’s recommender systems have been improved in terms of accuracy, action and extracting information concerning a user’s such systems are still insufficient in terms of novelty and preferences—are becoming necessary for “item-providing serendipity, giving unsatisfactory results to users. Two methods of services” such as internet shopping sites and department stores. In “serendipitous recommendation” are therefore proposed. However, the future, recommender systems will recommend items by a method for recommending serendipitous items accurately to monitoring all a user’s preferences. Users will get information users does not yet exist, because what kinds of items are suitable for their needs, and they will have an opportunity to serendipitous is not clearly defined. Accordingly, a human discover new items. Moreover, service providers will be able to preference model of serendipitous items based on actual data provide services continuously because users will use their systems concerning a user’s impression collected by questionnaires was more frequently. devised. Two serendipitous recommendation methods based on Recommendation technology is one way to retrieve information the model were devised and evaluated according to a user’s actual that suits a user’s preferences. In information-retrieval theory, impression. The evaluation results show that one of these useful information is categorized as two types: that which users recommendation methods, the one using general unexpectedness recognize as useful, and that which users do not recognize as independent of user profiles, can recommend the serendipitous useful but is actually useful [1]. We suppose that the items users items accurately. like are categorized as the same two types; accordingly, in this Categories and Subject Descriptors paper, the second type of items is defined as “serendipitous items.” H.1.2 [Models and Principles]: User/Machine Interface – Human In general, typical recommender systems use either of two factors, Human information processing. strategies: a content-based approach or collaborative filtering [2]. The content-based approach recommends items similar to users General Terms selected items by calculating the similarity between items by Human Factors using feature vectors generated by extraction of a user’s selection record. Collaborative filtering recommends items selected by Keywords multiple users whose selection histories are similar to the relevant Recommender systems, user preference, content-based, user by calculating similarity between users’ records. serendipity, unexpected. These two methods recommend items similar to the ones that the 1. INTRODUCTION user selected before. These items belong to the first type stated above because they are recognized as interesting items by users. In recent years, the amount of information accessible to users is For example, a typical recommendation recommends TV increasing and becoming more diversified because of the growth programs featuring actor A to users who frequently watch TV of information technology and the expansion of commercial use programs featuring actor A. Consequently, a user might get bored of IT. Under this circumstance, although users can select various with typical recommendation because it always recommends items (such as information, TV programs, and books) they cannot similar items that a user already knows are interesting [2]. For that select the best of those items from a vast amount of items reason, recommending items belonging to the second type– namely, serendipitous ones—become necessary. For example, Copyright is held by the author/owner(s). Workshop on the Practical Use serendipitous recommendation recommends educational programs of Recommender Systems, Algorithms and Technologies (PRSAT 2010), featuring performer A to users who do not usually watch held in conjunction with RecSys 2010. September 30, 2010, Barcelona, educational programs but frequently watch performer A. Spain. Nevertheless, typical recommendation methods cannot 3. MODELING SERENDIPITOUS ITEMS recommend such serendipitous items preferentially. The purpose of this study is to realize serendipitous ACCORDING TO ANALYSIS OF ACTUAL recommendation. Accordingly, actual data that users recognized as “serendipitous” was collected, and a user-preference model was DATA established first. Serendipitous recommendation methods based on that model were devised and evaluated with actual data. The 3.1 User-preference model results of this evaluation verified the effectiveness of a serendipitous recommendation method using “general The assumption of user preference was established first, and what unexpectedness” that is independent from a user’s profile. kinds of items are serendipitous for users was verified by analyzing a user’s actual impressions collected by questionnaires 2. RELATED WORKS AND MOTIVATION based on this assumption. The user-preference model established In the early stage of developing recommendation systems, the before the questionnaires were given is explained in the following. accuracy of recommendation of the first-type items was improved. Figure 1 shows the concept of the model. In this model, items are It was thought that this improved accuracy was enough to enhance arranged in feature vector space generated by features of items. user satisfaction. However, it is recognized that novelty and Although this feature vector space is highly dimensional, for serendipity are important factors in satisfying a user, aside from simplicity, two-dimensional space is introduced in Figure 1. Items simply suitability to a user’s preference [2, 3, 4, 5]. that a user selected before exist in the area near the feature vector that the user recognizes and knows are interesting (so-called There are several related works on serendipitous recommendation. “recognized items” below because the user recognizes them as Ziegler et al. supposed that serendipitous items exist in interesting and not surprising if recommended). In a distant area recommendation lists of different items in different categories from that area, serendipitous items (namely, surprising and more than in the lists of similar items, and they proposed a interesting items) are supposed to exist. In an area far from the recommendation method to increase diversity of recommendation recognized area, not-interesting items are supposed to exist. lists [6, 7]. They defined “intra-list similarity” as the similarity Broadly speaking, it is supposed that each user has several between all items in a recommendation list by calculating recognized areas in the feature vector space, because there may be similarity between two items. Moreover, they increased diversity several reasons that the user selected certain interesting items; for by inserting low-similarity items. example, the reasons for selecting a drama and a documentary Approaches that recommend serendipitous items directly have program may be different. also been proposed. Hijikata et al. proposed a method for Items improving novelty and serendipity by calculating the probability of known items by using the information about knowns or Not interesting unknowns given explicitly by user [8]. Another method calculates the probability of “degree of interest” by using an evaluation of recognized items selected by a user (namely, “interested” or “not interested”). The items whose degree-of-interest probabilities are nearly equal serendipitous are taken as serendipitous and recommended [9]. Another proposed method considers the items that are different recognized recognized from the ones users use habitually as serendipitous and recommends those [10]. This method uses a preference model to predict items that users like and a habit model to predict items that serendipitous serendipitous users use habitually. It then recommends a recommendation list including serendipitous items by predicting the unexpectedness of items by calculating differences between the results of the Fig. 1: Concept of user-preference model preference model and the habit model. As mentioned above, the only serendipitous recommendation 3.2 Questionnaire methods proposed until now are based on researchers’ own To collect users’ actual impressions, a questionnaire was given to assumptions; no methods based on actual data regarding a user’s thirty users. The method is mentioned below. First, users read the actual impression of selected items have been devised. Moreover, information concerning a TV program selected randomly from TV many works suppose that serendipitous items mean unexpected programs over three months (31,433 programs), and then they items, and they do not treat items that are unexpected and classify these TV programs as recommended items into three interesting. categories: “recognized program” (first-type item), “serendipitous In this study, the authors clarified what kinds of items are actually program” (second type) and “not-interesting program.” An serendipitous by collecting data concerning a user’s actual electric program guide (EPG) is used to provide the information impressions, made assumptions based on that actual data, and concerning TV programs, which includes title, performer, and the devised two serendipitous recommendation methods based on other contents of programs. those assumptions. In the questionnaire, three categories are available for choice by users. “Recognized program” means programs that users can expect from their own preference, for example, programs that users frequently watch. “Serendipitous program” means programs that users feel are interesting and surprising when recommended, Fig. 2: Number of evaluated programs by each user Vertical axis: Number of evaluated programs, Horizontal axis: User ID Fig. 3: Ratio of recognized programs and serendipitous programs in all programs for users Vertical axis: Rate of each program in all programs, Horizontal axis: User ID for example, programs that users do not expect from their own N preferences but are interested in. “Not-interesting program” means distance ( Pi , Pj ) = ∑ wn Pi ( n ) − Pj ( n) (1) programs that users are not interested in even though n =1 recommended. where Pi(n) means the vector component of the nth term in It takes much time to answer this questionnaire (about one minute program Pi, whether program Pi includes the nth term or not (1 or per program evaluation), so each user answered the questionnaire 0), wn means the user’s weight (a metric of user’s preference) of over one month, from ten to one hundred answers per day. We the nth term. The user’s own distance between programs is supposed that a user’s preference does not change much over one determined by introducing user’s weight wn. month, because a series of TV programs lasts about three months. Weight wn of nth term v is calculated by TFIDF (product of term All users live in Japan, twenty six work at Hitachi, Ltd., Central frequency and inverse document frequency) [11]. TFIDF is a Research Laboratory and four are university students. Twenty five metric of weighting characteristic terms occurring in observed are male, and five are female. Fifteen are from twenty to thirty documents by frequency in observed groups and in all groups. years old, eleven from thirty to forty, and the other four from forty This metric is introduced to weight a user’s preference as follows. to fifty. Each user evaluated about one thousand to five thousand N all programs. wn = tfidf (v | D ) = tf (v | D ) × log( ) (2) N (v ) 3.3 Analysis method Here, D represents observed program, which means recognized The programs collected by questionnaire are first converted into programs here, tf(v|D) means the frequency that term v occurs in term vectors extraction by morphological analysis of text D, Nall means the total number of programs, and N(v) means the information in the EPG. Each vector component contains two number of occurrences of term v in all programs. values, whether the EPG text includes the term or not. The recognized programs are then clustered to estimate the recognized 3.4 Results area. For clustering, the distance between program Pi and program Pj is defined as Figure 2 shows the number of programs evaluated by each user, and Figure 3 shows the ratio of recognized programs and serendipitous programs in all programs. Although each user has various ratios, it is clear the rates of recognized programs are very low and there are a lot of inefficient programs. It is also clear that users who frequently watch TV programs evaluate more programs as recognized rather than serendipitous. On the other hand, the users who rarely watch TV programs evaluate more programs as serendipitous rather than recognized. Ratio of radius In regard to the questionnaire, most users said they feel serendipitous concerning the programs that they do not know before but are interesting (for example, interesting educational programs for users who do not watch educational programs) and the programs including an unexpected combination of interesting features (for example, educational programs featuring a comedian). However, surprising programs are not always unexpected programs, so the meanings of surprising would include other factors. Moreover, some users evaluated no Height programs as serendipitous, and some users cannot classify programs into the three types; consequently, it is difficult to Fig. 5: Ratio of radiuses of recognized area and not-interesting evaluate their subjective impression quantitatively. area with height of cluster A clustering result of recognized programs is shown as (Denomination: radius of not-interesting area) dendrogram in Figure 4. The clustering method used is hierarchical clustering. The height of the cluster means average distance between programs belonging to the cluster and the cluster center calculated from Equation (1). The number of recognized areas is determined by cutting at a certain height of a cluster. Figure 5 shows the ratio of average distance of recognized Ratio of radius programs (radius of recognized area) and average distance of not- interesting programs (radius of not-interesting area) from the nearest center of the cluster with height of clusters. When the number of clusters increases, not-interesting programs are distributed outside of recognized area. On the other hand, when the number of clusters decreases, not-interesting programs are distributed inside the recognized area because the number of clusters is fewer than the true number of recognized areas. Figure 6 shows the ratio of average distance of serendipitous programs (radius of serendipitous area) and the radius of a Height recognized area from the nearest center of the cluster with height of clusters. As the number of clusters increases, serendipitous Fig. 6: Ratio of radiuses of recognized area and serendipitous area programs are distributed outside of the recognized area. with height of cluster Figure 7 plots the results from Figures 5 and 6. It is indicated that (Denomination: radius of recognized area) not-interesting programs are distributed outside the recognized area, and serendipitous programs are distributed far outside the recognized area. 納得感のクラスター Recognized + Serendipitous × Ratio of radius 0.06 Height 0.04 高さ 0.02 0.00 1673 9198 9769 1353 1905 8811 4393 8457 14375 23329 18978 20690 21242 24435 13772 26547 15277 22886 10105 25566 17485 23906 23209 25542 22207 24819 27329 12598 14556 22883 23551 25515 10914 14193 15233 24756 25599 24823 25269 24140 24465 25043 23450 25870 23734 23259 25502 23645 22943 24745 28022 18361 15925 17857 18954 22499 21753 21206 22580 26523 27846 28686 26916 27486 22467 25623 24197 23962 24132 24427 24746 24040 24274 23590 23815 24275 24726 23864 24161 24919 25053 24687 24732 22960 24544 22911 24715 27807 18841 17687 19005 17930 18704 21852 22123 20562 19263 19304 16135 18518 17620 17006 17962 28348 18401 28985 12931 15458 23264 23936 25844 15446 12165 12562 12856 15751 17960 11459 11644 20275 18195 19694 20285 Recognized Programs Height as.dist(x) hclust (*, "average") Fig. 7: Ratio of radiuses of not-interesting area, recognized area, Fig. 4: Clustering result of recognized programs and serendipitous area with height of cluster (leaf nodes: recognized programs; vertical axis: height of cluster) (Denomination: not-interesting area) 3.5 Model based on analysis results 7, not-interesting programs are distributed broadly both in the recognized area and the serendipitous area; consequently, it is To summarize the results presented in this section, in the feature difficult to distinguish only serendipitous programs accurately by vector space generated by EPG texts, not-interesting programs are distance between programs given by Equation (1). distributed outside the recognized area and serendipitous programs are distributed far outside the recognized area. This Items Not interesting result does not support the assumption in Figure 1. We therefore suggest the structure of user preference as shown in Figure 8 instead of that shown in Figure 1. recognized Distance from the center of the recognized area means the number of terms in the program vector but not in the center because the program-vector components are described by only two values, whether each term in the contents of programs is included or not. In addition, the weight of terms is calculated as a user’s Serendipitous recognized preference by TFIDF. Therefore, even though the item includes recognized many low-weight terms and is rarely watched, the distance from the recognized area is not far. And if the program includes high- weight terms belonging to the other recognized area, the distance Not interesting from recognized area becomes far. Consequently, programs Not interesting including many high-weight terms belonging to the other recognized area and not similar to the ones in the nearest recognized area are distributed in the intermediate region of Fig. 8: User-preference model based on analysis results recognized areas, and users treat them as serendipitous programs. This assumption expresses that “the contents makes users feel serendipity concerning an unexpected combination of program contents,” which some users commented in the questionnaire. 4. PROPOSAL AND EVALUATION OF Figure 9 shows the distribution of each type of program plotted RECOMMENDATION METHODS against distance from one center of a recognized area. The solid line represents the distribution of not-interesting programs, the 4.1 Proposed methods dotted line represents the distribution of serendipitous programs, and the dashed line represents distribution of recognized programs. 4.1.1 Using distance between items The nearest peak of recognized programs to the origin represents the peak of the distribution of the recognized area, and the next- The distance between items used in this method is calculated from nearest peak represents several recognized areas. As shown in Fig. Equation (1) reflecting a user’s preference. First, the proposed 120 100 80 density density 60 40 20 0 0.00 0.02 0.04 0.06 0.08 0.10 distance distance Fig. 9: Density of programs in each area with distance from center of recognized area (Vertical axis: density of programs, horizontal axis: distance from center of recognized area, solid line: not interesting programs, dotted line: serendipitous programs, dashed line: recognized programs) recommender system learns features of programs according to the 1 user’s viewing history. In the same way as described in section 3, Unexpected ness ( Pi ) = (6) program vectors are defined by a term vector, whose component Expectednes s( Pi ) has two values. Second, the system splits watched programs (i.e., recognized programs) into several clusters by hierarchical Parameter α controls the degree of combination of a user’s clustering and finds the centers of recognized areas. The number preference and unexpectedness of programs. Simply put, equation of recognized areas is defined as 7 to 10 according to the results (5) is a very simple linear combination of squares of distance and from the questionnaire. The system then calculates the distance of unexpectedness. each not-watched program from the nearest center, and 4.2 Evaluation method recommends the 10 longest programs. In short, the system recommends 10 highest score programs calculated according to 4.2.1 Dataset Score ( Pi ) = distance ( Pi, Cnearest ) (3) The results of the questionnaire implemented in the third section Here, Cnearest means the center of the nearest recognized area with were used. Data of fourteen users who classified more than 100 program Pi. programs into recognized or serendipitous programs were selected, because it was supposed that serendipitous recommendation This method may not recommend serendipitous programs becomes necessary after watching TV programs for about one accurately because not-interesting programs are distributed month. (It was assumed that users get bored with typical broadly. This method is referred to as the “first method” hereafter. recommendation after about one month and most users watch fifty 4.1.2 Using general unexpectedness TV programs per month). Each user evaluated from 1000 to 5000 programs, and the ratio of serendipitous programs in all evaluated This method (hereafter, “second method”) introduces programs is 7 to 8%. “unexpectedness of programs” in addition to the distance used in the first method in order to capture a “surprising” factor. The 4.2.2 Procedure results of the questionnaire indicate that the serendipitous The three proposed methods are applied to each user. The programs have an unexpected aspect for users, as shown in Fig. 8. procedure is mentioned below. First, the system learns recognized It is assumed that “unexpectedness” means something is hard to areas from fifty recognized programs. In this evaluation predict the program contents. Regarding a program- experiment, 50 recognized programs were prepared randomly as a recommendation system, it is assumed that it is related to an training set from evaluated programs as recognized. Next, the unlikely combination of features. The second method treats highly system recommends ten high-score programs by using the unexpected and interesting programs as serendipitous programs. A proposed methods, random recommendation, and a method using general metric of difficulty of expecting programs for every user only unexpectedness for each user from the remaining evaluated is defined by the sum of the tendencies of co-occurrence of the programs by using the recognized areas learned first. Random terms in the program. recommendation means recommending ten programs randomly, and the method using only unexpectedness calculates a program Expectedness( Pi ) score according to unexpectedness only (α=0 in Equation (5)). 1 This experiment was performed ten times, and each time different = ∑ Tendency of co - occurrence(v, w) Pi v ,w∈Pi (4) recognized programs were used and the accuracy of each method was compared. 1 Nvw 4.2.3 Evaluation metrics = ∑ Pi v ,w∈Pi Nv + Nw − Nvw Our purpose is to recommend serendipitous programs. So we use detection rate and precision as evaluation metrics for the purpose Tendency of co-occurrence (v, w) means tendency of co- of evaluating accuracy of the proposed methods to detect occurrence of terms v and w in all programs. It makes it possible serendipitous programs. Detection rate means the probability of to evaluate quantitatively how unexpected a program is for users. detecting a serendipitous program and precision means rate of Nv means number of programs including term v, and Nvw means serendipitous programs in recommendation list. number of programs including both term v and w. |Pi| means number of terms in program Pi and is a normalized factor. If the 4.3 Results co-occurrence of the terms is low, expectedness will be low, and Table 1 lists the evaluation results of the two proposed methods, the program will be highly unexpected, so users would be unable random recommendation and only unexpectedness. Accuracy to find it. Unexpectedness is defined as the inverse of metrics are calculated as an average of users. Parameter α is set expectedness (see Equation (6)), and 10 high-score programs to 0.05, so the second method has the highest accuracy, . (calculated according to the sum of squares of distance between programs and unexpectedness as below) are recommended. The results in Table 1 show that detection rate and precision of random recommendation are low, so it suggests how difficult it is Score ( Pi ) = α × distance ( Pi , C nearest ) 2 to recommend serendipitous programs. On the other hand, the (5) accuracy of the second method (i.e., using unexpectedness of + (1 − α ) × Unexpected ness ( Pi ) 2 programs) is higher than the other methods, detection rate is 78.2% and precision is 21.6%. This result means that the second method recommends serendipitous programs accurately. Table 1: Accuracy results characteristics. Examinees in this experiment are deemed to live in similar environments. The influence of unexpectedness for Only users living in totally different environments (e.g., living in Method Random First Second unexpectedness different countries) might be significant. Unexpectedness may Detection 51.9 49.8 78.2 32.8 therefore be a frequency of contact with items similar to the Rate [%] relevant item which a user contacts with so far, with or without intention. Precision [%] 7.98 7.51 21.6 5.21 Finally, our proposed method is compared with the other related methods. It is hard to compare by accuracy because serendipity While accuracy of the first method (i.e., using distance only) is the depends on user’s subjective impression, so we compare these by same as that of the random method, accuracy of the second requirements in Table. 2. method is much higher than the random one, and accuracy of the unexpectedness-only method is lower than that of the random one. This result shows it is possible to recommend serendipitous Table 2: Comparison of serendipitous recommendation programs by using both distance reflecting a user’s preference and unexpectedness of programs. Different Different from Collaborativ The first method recommends programs including not-interesting Requirement Proposal from e Interesting & ones, because it recommends items that are not similar to Habit Not recognized programs. On the other hand, the second method distinguishes “unexpected and interesting programs” and User’s Unneces Unneces Unnecess Necessary “unexpected but not-interesting programs” from programs with Impression sary sary ary low similarity according to unexpectedness. Consequently, the accuracy of the second method is high. Other user’s Unneces Unneces Necessar Unnecessary record sary sary y Figure 10 shows the concept of user preference by distance and unexpectedness inferred from these results. Serendipitous User’s Unneces Necessa Unnecess Unnecessary programs and not-interesting programs are distant from the Habit sary ry ary recognized area. According to the result “only unexpectedness” in Information Necessa Necessa Unnecess Table 1, serendipitous programs exist in extra high- Necessary of Programs ry ry ary unexpectedness areas because they tend to have more combinations of terms whose tendency of co-occurrence is low. Moreover, in the right lower box, not-interesting programs may Related works require some information concerning users, one exist. It seems very possible that the user would already know the requires a user’s impression of recommended items, another highly unexpected programs near to recognized programs and not requires other users’ records, and another requires user’s habits. select them, because “unexpectedness” is a general metric and The proposed recommendation method requires few evaluation does not depend on a user’s record. values to learn a user’s preference and does not depend on user’s surroundings. On the other hand, it requires information concerning programs, but recently there is much information distance regarding programs on Internet reference sites like Wikipedia. In short, the proposal method has most broad utility regarding Not interesting Serendipitous various systems because it is useful for both devices and servers. As for our future work, however, which method satisfies users must be verified by a user’s subjective evaluation. On the other hand, we suggest using suitable terms for each user. 5. FUTURE WORK Recognized Not interesting Although the accuracy of our proposal serendipitous recommendation method was verified, the following three tasks remain as future work: improve accuracy, evaluate by more users, Unexpectedness and tune performance of actual system To improve accuracy, it is necessary to select the recognized area Fig. 10: Concept of user preference with distance and outside of which many serendipitous programs exist; in fact, there unexpectedness are some recognized areas outside of which serendipitous programs do not exist. By considering the radius and number of Unexpectedness of programs calculated from tendency of co- programs included in recognized areas, it is possible to select the occurrence of terms in the programs is introduced here. For best recognized area. Moreover, another approach to improving example, users find programs by reading TV guides and EPGs on accuracy is to get rich information concerning programs via web sites. Therefore, programs that have rare contents in TV metadata and information on web sites. guides are supposed to be serendipitous. TV guides and EPG are It is also necessary to satisfy users by capturing user context with not provided by users but by the surroundings of users, so we their spatial temporal information; for example, a user does not simply introduce unexpectedness independently from a user’s want to watch a program in the morning but in the evening instead. 7. REFERENCES It is also important to capture time-dependent user preferences, for example, users feel serendipity if a recommended program was [1] E. Toms: Serendipitous Information Retrieval, Proc. of DELOS not watched recently but has been watched in the past. With our Workshop, 2000 recommendation method, a user’s preference is described in a [2] Herlocker, J., et al.: Evaluating Collaborative Filtering Recommender Systems, ACM Transactions on Information Systems, Vol. 22, No. 1, feature vector space generated by the user’s selection history, so pp. 5-53 (2004) the structure of the space and distribution of user preference [3] K. Swearingen and R. Sinha: Beyond Algorithms: An HCI Perspective depends on time. on Recommender Systems, ACM SIGIR Workshop on Recommender To make the user-preference model statistically strong, it is Systems (2001) necessary to evaluate our proposed method by more users, [4] S. M. McNee, J. Riedl, and J. A. Konstan: Making Recommendations Better: An Analysis Model for Human-Recommender Interaction, In because the concept of serendipity is supposed to depend strongly proc. of ACM Special Interest Group on Computer-Human Interaction on user’s subjective impression. Moreover, it is important to (ACM SIGCHI), pp. 997-1101 (2006) establish methods for evaluating a user’s satisfaction [5] S. M. McNee, J. Riedl, and J. A. Konstan: Being accurate is not quantitatively. enough: How accuracy metrics have hurt recommender systems, In proc. of ACM Special Interest Group on Computer-Human Interaction To introduce our recommendation method in an actual system, it (ACM SIGCHI), pp. 997-1101 (2006) is necessary to design an optimal data structure and speed up the [6] C. N. Ziegler, G. Lausen, and L. S. Thieme: Taxonomy-driven method. Computation of Product Recommendations, In proc. of the 2004 ACM In this study, we verified the recommendation method by using CIKM Conference on Information and Knowledge Management, pp. TV programs, but this approach can be applied to recommend 406-415 (2004) [7] Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, and George items like books and DVDs from a user’s record of selecting TV Lausen: Improving Recommendation Lists Through Topic programs. We plan to use this approach to capture the meanings Diversification, In proc. of World Wide Web Conference, pp. 22-32 of users like and dislike by collecting and analyzing user’s records. (2005) [8] Y. Hijikata, T. Shimizu, and S. Nishida: Discovery-oriented 6. CONCLUSION Collaborative Filtering for Improving User Satisfaction, In proc. of the 14th ACM International Conference on Intelligent User To realize serendipitous recommendation, a recommendation Interfaces(ACM IUI 2009), pp. 67-76 (2009) method for extracting a user’s preference was proposed and [9] Leo Iaquinta, Macro de Gemmis, Pasquale Lops, Giovanni Semeraro, evaluated. In particular, based on actual data obtained by giving a Michele Filannino, and Piero Molino: Introducing Serendipity in a questionnaire to thirty users, a user-preference model using Content-based Recommender System, Hybrid Intelligent Systems, distance between programs was established. Based on this model, 2008. HIS '08. Eighth International Conference on 10-12 Sept. 2008, a serendipitous recommendation method using the distance and pages 168 – 173 unexpectedness of programs was proposed. This method [10] T. Murakami, et al.: A Method to Enhance Serendipity in Recommendation and its Evaluation, Transactions of the Japanese recommends a serendipitous program accurately at a detection Society for Artificial Intelligence, Vol. 24, Issue 5, pp. 428-436 (2009). rate is 78.2%. Moreover, it was found that the impression of [11] Li-Ping Jing, et al.: Improved Feature Selection Approach unexpectedness depends on a user’s living environment rather TFIDF In Text Mining, Proceedings of the First International than his or her character. This result is an important fact in regard Conference on Machine Learning and Cybernetics (2002) to understanding a user’s preference in principle.