<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Proposal and Evaluation of Serendipitous Recommendation Method Using General Unexpectedness</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Takayuki Akiyama</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kiyohiro Obara</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Masaaki Tanizaki</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hitachi, Ltd., Central Research, Laboratory</institution>
          ,
          <addr-line>1-280, Higashi-Koigakubo</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <volume>30</volume>
      <issue>2010</issue>
      <abstract>
        <p>Recommender systems support users in selecting items and services in an information-rich environment. Although recommender systems have been improved in terms of accuracy, such systems are still insufficient in terms of novelty and serendipity, giving unsatisfactory results to users. Two methods of “serendipitous recommendation” are therefore proposed. However, a method for recommending serendipitous items accurately to users does not yet exist, because what kinds of items are serendipitous is not clearly defined. Accordingly, a human preference model of serendipitous items based on actual data concerning a user's impression collected by questionnaires was devised. Two serendipitous recommendation methods based on the model were devised and evaluated according to a user's actual impression. The evaluation results show that one of these recommendation methods, the one using general unexpectedness independent of user profiles, can recommend the serendipitous items accurately.</p>
      </abstract>
      <kwd-group>
        <kwd>Recommender systems</kwd>
        <kwd>serendipity</kwd>
        <kwd>unexpected</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>H.1.2 [Models and Principles]: User/Machine Interface – Human
factors, Human information processing.</p>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>In recent years, the amount of information accessible to users is
increasing and becoming more diversified because of the growth
of information technology and the expansion of commercial use
of IT. Under this circumstance, although users can select various
items (such as information, TV programs, and books) they cannot
select the best of those items from a vast amount of items</p>
      <p>user
preference,
content-based,
including many useless items.</p>
      <p>To solve this problem, so-called “recommender systems”—for
recommending suitable items to users by monitoring a user’s
action and extracting information concerning a user’s
preferences—are becoming necessary for “item-providing
services” such as internet shopping sites and department stores. In
the future, recommender systems will recommend items by
monitoring all a user’s preferences. Users will get information
suitable for their needs, and they will have an opportunity to
discover new items. Moreover, service providers will be able to
provide services continuously because users will use their systems
more frequently.</p>
      <p>
        Recommendation technology is one way to retrieve information
that suits a user’s preferences. In information-retrieval theory,
useful information is categorized as two types: that which users
recognize as useful, and that which users do not recognize as
useful but is actually useful [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We suppose that the items users
like are categorized as the same two types; accordingly, in this
paper, the second type of items is defined as “serendipitous items.”
In general, typical recommender systems use either of two
strategies: a content-based approach or collaborative filtering [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
The content-based approach recommends items similar to users
selected items by calculating the similarity between items by
using feature vectors generated by extraction of a user’s selection
record. Collaborative filtering recommends items selected by
multiple users whose selection histories are similar to the relevant
user by calculating similarity between users’ records.
      </p>
      <p>
        These two methods recommend items similar to the ones that the
user selected before. These items belong to the first type stated
above because they are recognized as interesting items by users.
For example, a typical recommendation recommends TV
programs featuring actor A to users who frequently watch TV
programs featuring actor A. Consequently, a user might get bored
with typical recommendation because it always recommends
similar items that a user already knows are interesting [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. For that
reason, recommending items belonging to the second type–
namely, serendipitous ones—become necessary. For example,
serendipitous recommendation recommends educational programs
featuring performer A to users who do not usually watch
educational programs but frequently watch performer A.
Nevertheless, typical recommendation methods
recommend such serendipitous items preferentially.
cannot
The purpose of this study is to realize serendipitous
recommendation. Accordingly, actual data that users recognized
as “serendipitous” was collected, and a user-preference model was
established first. Serendipitous recommendation methods based
on that model were devised and evaluated with actual data. The
results of this evaluation verified the effectiveness of a
serendipitous recommendation method using “general
unexpectedness” that is independent from a user’s profile.
2. RELATED WORKS AND MOTIVATION
In the early stage of developing recommendation systems, the
accuracy of recommendation of the first-type items was improved.
It was thought that this improved accuracy was enough to enhance
user satisfaction. However, it is recognized that novelty and
serendipity are important factors in satisfying a user, aside from
simply suitability to a user’s preference [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5">2, 3, 4, 5</xref>
        ].
      </p>
      <p>
        There are several related works on serendipitous recommendation.
Ziegler et al. supposed that serendipitous items exist in
recommendation lists of different items in different categories
more than in the lists of similar items, and they proposed a
recommendation method to increase diversity of recommendation
lists [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. They defined “intra-list similarity” as the similarity
between all items in a recommendation list by calculating
similarity between two items. Moreover, they increased diversity
by inserting low-similarity items.
      </p>
      <p>
        Approaches that recommend serendipitous items directly have
also been proposed. Hijikata et al. proposed a method for
improving novelty and serendipity by calculating the probability
of known items by using the information about knowns or
unknowns given explicitly by user [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Another method calculates
the probability of “degree of interest” by using an evaluation of
items selected by a user (namely, “interested” or “not interested”).
The items whose degree-of-interest probabilities are nearly equal
are taken as serendipitous and recommended [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Another proposed method considers the items that are different
from the ones users use habitually as serendipitous and
recommends those [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. This method uses a preference model to
predict items that users like and a habit model to predict items that
users use habitually. It then recommends a recommendation list
including serendipitous items by predicting the unexpectedness of
items by calculating differences between the results of the
preference model and the habit model.
      </p>
      <p>As mentioned above, the only serendipitous recommendation
methods proposed until now are based on researchers’ own
assumptions; no methods based on actual data regarding a user’s
actual impression of selected items have been devised. Moreover,
many works suppose that serendipitous items mean unexpected
items, and they do not treat items that are unexpected and
interesting.</p>
      <p>In this study, the authors clarified what kinds of items are actually
serendipitous by collecting data concerning a user’s actual
impressions, made assumptions based on that actual data, and
devised two serendipitous recommendation methods based on
those assumptions.
3. MODELING SERENDIPITOUS ITEMS
ACCORDING TO ANALYSIS OF ACTUAL</p>
    </sec>
    <sec id="sec-3">
      <title>DATA</title>
    </sec>
    <sec id="sec-4">
      <title>3.1 User-preference model</title>
      <p>The assumption of user preference was established first, and what
kinds of items are serendipitous for users was verified by
analyzing a user’s actual impressions collected by questionnaires
based on this assumption. The user-preference model established
before the questionnaires were given is explained in the following.
Figure 1 shows the concept of the model. In this model, items are
arranged in feature vector space generated by features of items.
Although this feature vector space is highly dimensional, for
simplicity, two-dimensional space is introduced in Figure 1. Items
that a user selected before exist in the area near the feature vector
that the user recognizes and knows are interesting (so-called
“recognized items” below because the user recognizes them as
interesting and not surprising if recommended). In a distant area
from that area, serendipitous items (namely, surprising and
interesting items) are supposed to exist. In an area far from the
recognized area, not-interesting items are supposed to exist.
Broadly speaking, it is supposed that each user has several
recognized areas in the feature vector space, because there may be
several reasons that the user selected certain interesting items; for
example, the reasons for selecting a drama and a documentary
program may be different.</p>
      <p>Items
Not interesting
recognized
serendipitous
recognized
serendipitous
recognized
serendipitous
To collect users’ actual impressions, a questionnaire was given to
thirty users. The method is mentioned below. First, users read the
information concerning a TV program selected randomly from TV
programs over three months (31,433 programs), and then they
classify these TV programs as recommended items into three
categories: “recognized program” (first-type item), “serendipitous
program” (second type) and “not-interesting program.” An
electric program guide (EPG) is used to provide the information
concerning TV programs, which includes title, performer, and the
other contents of programs.</p>
      <p>In the questionnaire, three categories are available for choice by
users. “Recognized program” means programs that users can
expect from their own preference, for example, programs that
users frequently watch. “Serendipitous program” means programs
that users feel are interesting and surprising when recommended,
for example, programs that users do not expect from their own
preferences but are interested in. “Not-interesting program” means
programs that users are not interested in even though
recommended.</p>
      <p>It takes much time to answer this questionnaire (about one minute
per program evaluation), so each user answered the questionnaire
over one month, from ten to one hundred answers per day. We
supposed that a user’s preference does not change much over one
month, because a series of TV programs lasts about three months.
All users live in Japan, twenty six work at Hitachi, Ltd., Central
Research Laboratory and four are university students. Twenty five
are male, and five are female. Fifteen are from twenty to thirty
years old, eleven from thirty to forty, and the other four from forty
to fifty. Each user evaluated about one thousand to five thousand
programs.</p>
    </sec>
    <sec id="sec-5">
      <title>3.3 Analysis method</title>
      <p>The programs collected by questionnaire are first converted into
term vectors extraction by morphological analysis of text
information in the EPG. Each vector component contains two
values, whether the EPG text includes the term or not. The
recognized programs are then clustered to estimate the recognized
area. For clustering, the distance between program Pi and program
Pj is defined as</p>
      <p>N
distance (Pi , Pj ) = ∑ wn Pi (n) − Pj (n)
(1)
n=1
where Pi(n) means the vector component of the nth term in
program Pi, whether program Pi includes the nth term or not (1 or
0), wn means the user’s weight (a metric of user’s preference) of
the nth term. The user’s own distance between programs is
determined by introducing user’s weight wn.</p>
      <p>
        Weight wn of nth term v is calculated by TFIDF (product of term
frequency and inverse document frequency) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. TFIDF is a
metric of weighting characteristic terms occurring in observed
documents by frequency in observed groups and in all groups.
This metric is introduced to weight a user’s preference as follows.
wn = tfidf (v | D) = tf (v | D) × log(
N all )
N (v)
(2)
Here, D represents observed program, which means recognized
programs here, tf(v|D) means the frequency that term v occurs in
D, Nall means the total number of programs, and N(v) means the
number of occurrences of term v in all programs.
      </p>
    </sec>
    <sec id="sec-6">
      <title>3.4 Results</title>
      <p>Figure 2 shows the number of programs evaluated by each user,
and Figure 3 shows the ratio of recognized programs and
serendipitous programs in all programs. Although each user has
various ratios, it is clear the rates of recognized programs are very
low and there are a lot of inefficient programs. It is also clear that
users who frequently watch TV programs evaluate more programs
as recognized rather than serendipitous. On the other hand, the
users who rarely watch TV programs evaluate more programs as
serendipitous rather than recognized.</p>
      <p>In regard to the questionnaire, most users said they feel
serendipitous concerning the programs that they do not know
before but are interesting (for example, interesting educational
programs for users who do not watch educational programs) and
the programs including an unexpected combination of interesting
features (for example, educational programs featuring a
comedian). However, surprising programs are not always
unexpected programs, so the meanings of surprising would
include other factors. Moreover, some users evaluated no
programs as serendipitous, and some users cannot classify
programs into the three types; consequently, it is difficult to
evaluate their subjective impression quantitatively.</p>
      <p>A clustering result of recognized programs is shown as
dendrogram in Figure 4. The clustering method used is
hierarchical clustering. The height of the cluster means average
distance between programs belonging to the cluster and the cluster
center calculated from Equation (1). The number of recognized
areas is determined by cutting at a certain height of a cluster.
Figure 5 shows the ratio of average distance of recognized
programs (radius of recognized area) and average distance of
notinteresting programs (radius of not-interesting area) from the
nearest center of the cluster with height of clusters. When the
number of clusters increases, not-interesting programs are
distributed outside of recognized area. On the other hand, when
the number of clusters decreases, not-interesting programs are
distributed inside the recognized area because the number of
clusters is fewer than the true number of recognized areas.
Figure 6 shows the ratio of average distance of serendipitous
programs (radius of serendipitous area) and the radius of a
recognized area from the nearest center of the cluster with height
of clusters. As the number of clusters increases, serendipitous
programs are distributed outside of the recognized area.
00
.0 572978904235724777860566850609420719299856835115149333569923694065435070345902454345227361255754995306802346861686672397623227464074901575266461195387326044111507418705300452236263043518200662480198698531586436444665625651605305594475951193579485
134233198260221244173256125282110255147293223255222284237152154282253255190114125274255284225214244205243285273R223255263e292274c20861o138195187g198242n271221252i256z287268e296247d242265214293P214244r274204o224253g283224274r283a214294m205264274292s25429227428718816710919717828121225012913911615816710719723814819799289124152239238254151125128127159173191141161220118883448169220
(leaf nodes: recognized programs; vertical axis: height of cluster)</p>
      <sec id="sec-6-1">
        <title>Height</title>
        <p>Recognized +
Serendipitous ×</p>
      </sec>
      <sec id="sec-6-2">
        <title>Height</title>
        <p>(Denomination: not-interesting area)
納得感のクラスター</p>
        <p>as.dist(x)
hclust(*,"average")</p>
      </sec>
      <sec id="sec-6-3">
        <title>Height</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Model based on analysis results</title>
      <p>To summarize the results presented in this section, in the feature
vector space generated by EPG texts, not-interesting programs are
distributed outside the recognized area and serendipitous
programs are distributed far outside the recognized area. This
result does not support the assumption in Figure 1. We therefore
suggest the structure of user preference as shown in Figure 8
instead of that shown in Figure 1.</p>
      <p>Distance from the center of the recognized area means the number
of terms in the program vector but not in the center because the
program-vector components are described by only two values,
whether each term in the contents of programs is included or not.
In addition, the weight of terms is calculated as a user’s
preference by TFIDF. Therefore, even though the item includes
many low-weight terms and is rarely watched, the distance from
the recognized area is not far. And if the program includes
highweight terms belonging to the other recognized area, the distance
from recognized area becomes far. Consequently, programs
including many high-weight terms belonging to the other
recognized area and not similar to the ones in the nearest
recognized area are distributed in the intermediate region of
recognized areas, and users treat them as serendipitous programs.
This assumption expresses that “the contents makes users feel
serendipity concerning an unexpected combination of program
contents,” which some users commented in the questionnaire.
7, not-interesting programs are distributed broadly both in the
recognized area and the serendipitous area; consequently, it is
difficult to distinguish only serendipitous programs accurately by
distance between programs given by Equation (1).</p>
      <p>Items
4. PROPOSAL AND EVALUATION OF
RECOMMENDATION METHODS</p>
    </sec>
    <sec id="sec-8">
      <title>4.1 Proposed methods</title>
      <p>4.1.1</p>
      <sec id="sec-8-1">
        <title>Using distance between items</title>
        <p>The distance between items used in this method is calculated from</p>
        <p>Equation (1) reflecting a user’s preference. First, the proposed
y
ity it
sn s 60
de n
e
d
0
2
1
0
0
1
0
8
0
4
0
2
0
0.00
0.02
0.08</p>
        <p>0.10
0.04</p>
        <p>0.06
ddiissttaanncece
solid line: not interesting programs, dotted line: serendipitous programs, dashed line: recognized programs)
recommender system learns features of programs according to the
user’s viewing history. In the same way as described in section 3,
program vectors are defined by a term vector, whose component
has two values. Second, the system splits watched programs (i.e.,
recognized programs) into several clusters by hierarchical
clustering and finds the centers of recognized areas. The number
of recognized areas is defined as 7 to 10 according to the results
from the questionnaire. The system then calculates the distance of
each not-watched program from the nearest center, and
recommends the 10 longest programs. In short, the system
recommends 10 highest score programs calculated according to
Score( Pi ) = distance (Pi, Cnearest )</p>
        <p>(3)
Here, Cnearest means the center of the nearest recognized area with
program Pi.</p>
        <p>This method may not recommend serendipitous programs
accurately because not-interesting programs are distributed
broadly. This method is referred to as the “first method” hereafter.
4.1.2</p>
      </sec>
      <sec id="sec-8-2">
        <title>Using general unexpectedness</title>
        <p>This method (hereafter, “second method”) introduces
“unexpectedness of programs” in addition to the distance used in
the first method in order to capture a “surprising” factor. The
results of the questionnaire indicate that the serendipitous
programs have an unexpected aspect for users, as shown in Fig. 8.
It is assumed that “unexpectedness” means something is hard to
predict the program contents. Regarding a
programrecommendation system, it is assumed that it is related to an
unlikely combination of features. The second method treats highly
unexpected and interesting programs as serendipitous programs. A
general metric of difficulty of expecting programs for every user
is defined by the sum of the tendencies of co-occurrence of the
terms in the program.</p>
        <p>Expectedness(Pi )
=
=
1
1
Pi v,w∈Pi</p>
        <p>∑
Pi v,w∈Pi Nv + Nw − Nvw</p>
        <p>Nvw
∑ Tendency of co - occurrence(v, w)
(4)
Tendency of co-occurrence (v, w) means tendency of
cooccurrence of terms v and w in all programs. It makes it possible
to evaluate quantitatively how unexpected a program is for users.
Nv means number of programs including term v, and Nvw means
number of programs including both term v and w. |Pi| means
number of terms in program Pi and is a normalized factor. If the
co-occurrence of the terms is low, expectedness will be low, and
the program will be highly unexpected, so users would be unable
to find it. Unexpectedness is defined as the inverse of
expectedness (see Equation (6)), and 10 high-score programs
(calculated according to the sum of squares of distance between
programs and unexpectedness as below) are recommended.
Score ( Pi ) = α × distance ( Pi , C nearest )
2
+ (1 − α ) × Unexpected ness ( Pi ) 2
(5)</p>
        <p>Unexpected ness ( Pi ) =
Expectednes s( Pi )
Parameter α controls the degree of combination of a user’s
preference and unexpectedness of programs. Simply put, equation
(5) is a very simple linear combination of squares of distance and
unexpectedness.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>4.2 Evaluation method</title>
      <p>4.2.1</p>
      <sec id="sec-9-1">
        <title>Dataset</title>
        <p>The results of the questionnaire implemented in the third section
were used. Data of fourteen users who classified more than 100
programs into recognized or serendipitous programs were selected,
because it was supposed that serendipitous recommendation
becomes necessary after watching TV programs for about one
month. (It was assumed that users get bored with typical
recommendation after about one month and most users watch fifty
TV programs per month). Each user evaluated from 1000 to 5000
programs, and the ratio of serendipitous programs in all evaluated
programs is 7 to 8%.</p>
      </sec>
      <sec id="sec-9-2">
        <title>4.2.2 Procedure</title>
        <p>The three proposed methods are applied to each user. The
procedure is mentioned below. First, the system learns recognized
areas from fifty recognized programs. In this evaluation
experiment, 50 recognized programs were prepared randomly as a
training set from evaluated programs as recognized. Next, the
system recommends ten high-score programs by using the
proposed methods, random recommendation, and a method using
only unexpectedness for each user from the remaining evaluated
programs by using the recognized areas learned first. Random
recommendation means recommending ten programs randomly,
and the method using only unexpectedness calculates a program
score according to unexpectedness only (α=0 in Equation (5)).
This experiment was performed ten times, and each time different
recognized programs were used and the accuracy of each method
was compared.</p>
      </sec>
      <sec id="sec-9-3">
        <title>4.2.3 Evaluation metrics</title>
        <p>Our purpose is to recommend serendipitous programs. So we use
detection rate and precision as evaluation metrics for the purpose
of evaluating accuracy of the proposed methods to detect
serendipitous programs. Detection rate means the probability of
detecting a serendipitous program and precision means rate of
serendipitous programs in recommendation list.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>4.3 Results</title>
      <p>Table 1 lists the evaluation results of the two proposed methods,
random recommendation and only unexpectedness. Accuracy
metrics are calculated as an average of users. Parameter α is set
to 0.05, so the second method has the highest accuracy, .
The results in Table 1 show that detection rate and precision of
random recommendation are low, so it suggests how difficult it is
to recommend serendipitous programs. On the other hand, the
accuracy of the second method (i.e., using unexpectedness of
programs) is higher than the other methods, detection rate is
78.2% and precision is 21.6%. This result means that the second
method recommends serendipitous programs accurately.
While accuracy of the first method (i.e., using distance only) is the
same as that of the random method, accuracy of the second
method is much higher than the random one, and accuracy of the
unexpectedness-only method is lower than that of the random one.
This result shows it is possible to recommend serendipitous
programs by using both distance reflecting a user’s preference and
unexpectedness of programs.</p>
      <p>The first method recommends programs including not-interesting
ones, because it recommends items that are not similar to
recognized programs. On the other hand, the second method
distinguishes “unexpected and interesting programs” and
“unexpected but not-interesting programs” from programs with
low similarity according to unexpectedness. Consequently, the
accuracy of the second method is high.</p>
      <p>Figure 10 shows the concept of user preference by distance and
unexpectedness inferred from these results. Serendipitous
programs and not-interesting programs are distant from the
recognized area. According to the result “only unexpectedness” in
Table 1, serendipitous programs exist in extra
highunexpectedness areas because they tend to have more
combinations of terms whose tendency of co-occurrence is low.
Moreover, in the right lower box, not-interesting programs may
exist. It seems very possible that the user would already know the
highly unexpected programs near to recognized programs and not
select them, because “unexpectedness” is a general metric and
does not depend on a user’s record.</p>
      <p>e
c
n
a
t
s
i
d</p>
      <p>Not interesting</p>
      <p>Serendipitous
Recognized</p>
      <p>Not interesting</p>
      <p>Unexpectedness
Unexpectedness of programs calculated from tendency of
cooccurrence of terms in the programs is introduced here. For
example, users find programs by reading TV guides and EPGs on
web sites. Therefore, programs that have rare contents in TV
guides are supposed to be serendipitous. TV guides and EPG are
not provided by users but by the surroundings of users, so we
simply introduce unexpectedness independently from a user’s
characteristics. Examinees in this experiment are deemed to live
in similar environments. The influence of unexpectedness for
users living in totally different environments (e.g., living in
different countries) might be significant. Unexpectedness may
therefore be a frequency of contact with items similar to the
relevant item which a user contacts with so far, with or without
intention.</p>
      <p>Finally, our proposed method is compared with the other related
methods. It is hard to compare by accuracy because serendipity
depends on user’s subjective impression, so we compare these by
requirements in Table. 2.
Related works require some information concerning users, one
requires a user’s impression of recommended items, another
requires other users’ records, and another requires user’s habits.
The proposed recommendation method requires few evaluation
values to learn a user’s preference and does not depend on user’s
surroundings. On the other hand, it requires information
concerning programs, but recently there is much information
regarding programs on Internet reference sites like Wikipedia. In
short, the proposal method has most broad utility regarding
various systems because it is useful for both devices and servers.
As for our future work, however, which method satisfies users
must be verified by a user’s subjective evaluation. On the other
hand, we suggest using suitable terms for each user.</p>
    </sec>
    <sec id="sec-11">
      <title>5. FUTURE WORK</title>
      <p>Although the accuracy of our proposal serendipitous
recommendation method was verified, the following three tasks
remain as future work: improve accuracy, evaluate by more users,
and tune performance of actual system
To improve accuracy, it is necessary to select the recognized area
outside of which many serendipitous programs exist; in fact, there
are some recognized areas outside of which serendipitous
programs do not exist. By considering the radius and number of
programs included in recognized areas, it is possible to select the
best recognized area. Moreover, another approach to improving
accuracy is to get rich information concerning programs via
metadata and information on web sites.</p>
      <p>It is also necessary to satisfy users by capturing user context with
their spatial temporal information; for example, a user does not
want to watch a program in the morning but in the evening instead.
It is also important to capture time-dependent user preferences, for
example, users feel serendipity if a recommended program was
not watched recently but has been watched in the past. With our
recommendation method, a user’s preference is described in a
feature vector space generated by the user’s selection history, so
the structure of the space and distribution of user preference
depends on time.</p>
      <p>To make the user-preference model statistically strong, it is
necessary to evaluate our proposed method by more users,
because the concept of serendipity is supposed to depend strongly
on user’s subjective impression. Moreover, it is important to
establish methods for evaluating a user’s satisfaction
quantitatively.</p>
      <p>To introduce our recommendation method in an actual system, it
is necessary to design an optimal data structure and speed up the
method.</p>
      <p>In this study, we verified the recommendation method by using
TV programs, but this approach can be applied to recommend
items like books and DVDs from a user’s record of selecting TV
programs. We plan to use this approach to capture the meanings
of users like and dislike by collecting and analyzing user’s records.</p>
    </sec>
    <sec id="sec-12">
      <title>6. CONCLUSION</title>
      <p>To realize serendipitous recommendation, a recommendation
method for extracting a user’s preference was proposed and
evaluated. In particular, based on actual data obtained by giving a
questionnaire to thirty users, a user-preference model using
distance between programs was established. Based on this model,
a serendipitous recommendation method using the distance and
unexpectedness of programs was proposed. This method
recommends a serendipitous program accurately at a detection
rate is 78.2%. Moreover, it was found that the impression of
unexpectedness depends on a user’s living environment rather
than his or her character. This result is an important fact in regard
to understanding a user’s preference in principle.</p>
    </sec>
    <sec id="sec-13">
      <title>7. REFERENCES</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Toms</surname>
          </string-name>
          <article-title>: Serendipitous Information Retrieval</article-title>
          ,
          <source>Proc. of DELOS Workshop</source>
          ,
          <year>2000</year>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Herlocker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.:
          <source>Evaluating Collaborative Filtering Recommender Systems, ACM Transactions on Information Systems</source>
          , Vol.
          <volume>22</volume>
          , No.
          <issue>1</issue>
          , pp.
          <fpage>5</fpage>
          -
          <lpage>53</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Swearingen</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Sinha</surname>
          </string-name>
          <article-title>: Beyond Algorithms: An HCI Perspective on Recommender Systems</article-title>
          ,
          <source>ACM SIGIR Workshop on Recommender Systems</source>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>McNee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Riedl</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Konstan</surname>
          </string-name>
          <article-title>: Making Recommendations Better: An Analysis Model for Human-Recommender Interaction, In proc</article-title>
          . of ACM Special Interest Group on Computer-Human
          <source>Interaction (ACM SIGCHI)</source>
          , pp.
          <fpage>997</fpage>
          -
          <lpage>1101</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>McNee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Riedl</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Konstan</surname>
          </string-name>
          <article-title>: Being accurate is not enough: How accuracy metrics have hurt recommender systems</article-title>
          ,
          <source>In proc. of ACM Special Interest Group on Computer-Human Interaction (ACM SIGCHI)</source>
          , pp.
          <fpage>997</fpage>
          -
          <lpage>1101</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C. N.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          , G. Lausen, and
          <string-name>
            <surname>L. S.</surname>
          </string-name>
          <article-title>Thieme: Taxonomy-driven Computation of Product Recommendations</article-title>
          ,
          <source>In proc. of the 2004 ACM CIKM Conference on Information and Knowledge Management</source>
          , pp.
          <fpage>406</fpage>
          -
          <lpage>415</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Cai-Nicolas</surname>
            <given-names>Ziegler</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sean M. McNee</surname>
            ,
            <given-names>Joseph A.</given-names>
          </string-name>
          <string-name>
            <surname>Konstan</surname>
          </string-name>
          , and George Lausen:
          <article-title>Improving Recommendation Lists Through Topic Diversification</article-title>
          ,
          <source>In proc. of World Wide Web Conference</source>
          , pp.
          <fpage>22</fpage>
          -
          <lpage>32</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hijikata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Shimizu</surname>
          </string-name>
          , and S. Nishida:
          <article-title>Discovery-oriented Collaborative Filtering for Improving User Satisfaction</article-title>
          ,
          <source>In proc. of the 14th ACM International Conference on Intelligent User Interfaces(ACM IUI</source>
          <year>2009</year>
          ), pp.
          <fpage>67</fpage>
          -
          <lpage>76</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Leo</given-names>
            <surname>Iaquinta</surname>
          </string-name>
          , Macro de Gemmis, Pasquale Lops, Giovanni Semeraro, Michele Filannino, and Piero Molino:
          <article-title>Introducing Serendipity in a Content-based Recommender System</article-title>
          ,
          <source>Hybrid Intelligent Systems</source>
          ,
          <year>2008</year>
          . HIS '08. Eighth International Conference on 10-12 Sept.
          <year>2008</year>
          , pages
          <fpage>168</fpage>
          -
          <lpage>173</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Murakami</surname>
          </string-name>
          , et al.:
          <article-title>A Method to Enhance Serendipity in Recommendation and its Evaluation, Transactions of the Japanese Society for Artificial Intelligence</article-title>
          , Vol.
          <volume>24</volume>
          , Issue 5, pp.
          <fpage>428</fpage>
          -
          <lpage>436</lpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Li-Ping Jing</surname>
          </string-name>
          , et al.:
          <article-title>Improved Feature Selection Approach TFIDF In Text Mining</article-title>
          ,
          <source>Proceedings of the First International Conference on Machine Learning and Cybernetics</source>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>