<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Decisions@RecSys workshop September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Enhancement of the Neutrality in Recommendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Toshihiro Kamishima, Shotaro Akaho,</string-name>
          <email>mail@kamishima.net, s.akaho@aist.go.jp, h.asoh@aist.go.jp</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jun Sakuma</string-name>
          <email>jun@cs.tsukuba.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Tsukuba</institution>
          ,
          <addr-line>1-1-1 Tennodai, Tsukuba, 305-8577 Japan; and</addr-line>
          ,
          <institution>Japan Science and Technology Agency</institution>
          ,
          <addr-line>4-1-8, Honcho, Kawaguchi, Saitama, 332-0012</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>and Hideki Asoh, National Institute of Advanced Industrial Science</institution>
          ,
          <addr-line>and Technology (AIST), AIST Tsukuba Central 2, Umezono 1-1-1, Tsukuba, Ibaraki, 305-8568</addr-line>
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2012</year>
      </pub-date>
      <volume>9</volume>
      <issue>2012</issue>
      <fpage>8</fpage>
      <lpage>14</lpage>
      <abstract>
        <p>This paper proposes an algorithm for making recommendation so that the neutrality toward the viewpoint speci ed by a user is enhanced. This algorithm is useful for avoiding to make decisions based on biased information. Such a problem is pointed out as the lter bubble, which is the in uence in social decisions biased by a personalization technology. To provide such a recommendation, we assume that a user speci es a viewpoint toward which the user want to enforce the neutrality, because recommendation that is neutral from any information is no longer recommendation. Given such a target viewpoint, we implemented information neutral recommendation algorithm by introducing a penalty term to enforce the statistical independence between the target viewpoint and a preference score. We empirically show that our algorithm enhances the independence toward the speci ed viewpoint by and then demonstrate how sets of recommended items are changed.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;neutrality</kwd>
        <kwd>fairness</kwd>
        <kwd>lter bubble</kwd>
        <kwd>collaborative ltering</kwd>
        <kwd>matrix decomposition</kwd>
        <kwd>information theory</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>H.3.3 [INFORMATION SEARCH AND RETRIEVAL]:
Information ltering</p>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>
        A recommender system searches for items and information
that would be useful to a user based on the user's behaviors
or the features of candidate items [
        <xref ref-type="bibr" rid="ref2 ref21">21, 2</xref>
        ]. GroupLens [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]
and many other recommender systems emerged in the
mid1990s, and further experimental and practical systems have
been developed during the explosion of Internet
merchandizing. In the past decade, such recommender systems have
been introduced and managed at many e-commerce sites to
promote items sold at these sites.
      </p>
      <p>During the RecSys panel discussion, panelists made the
following assertions about the lter bubble problem. Biased
topics would be certainly selected by the in uence of
personalization, but at the same time, it would be intrinsically
impossible to make recommendations that are absolutely
neutral from any viewpoint, and the diversity of provided
topics intrinsically has a trade-o relation to the tness of
these topics for users' interests or needs. To recommend
something, or more generally to select something, one must
consider the speci c aspect of a thing and must ignore the
other aspects of the thing. The panelists also pointed out
that current recommender systems fail to satisfy users'
information need that they search for a wide variety of topics
in the long term.</p>
      <p>To solve this problem, we propose an information neutral
recommender system that guarantees the neutrality of
recommendation results. As pointed out during the RecSys
2011 panel discussion, because it is impossible to make a
recommendation that is absolutely neutral from all
viewpoints, we consider neutrality from the viewpoint or
information speci ed by a user. For example, users can specify a
feature of an item, such as a brand, or a user feature, such
as a gender or an age, as a viewpoint. An information
neutral recommender system is designed so that these speci ed
features will not a ect recommendation results. This
system can also be used to avoid the use of information that
is restricted by law or regulation. For example, the use of
some information is prohibited for the purpose of making
recommendation by privacy policies.</p>
      <p>
        We borrowed the idea of fairness-aware mining, which we
proposed earlier [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], to build this information neutral
recommender system. To enhance the neutrality or the
independence in recommendation, we introduce a regularization
term that represents the mutual information between a
recommendation result and the speci ed viewpoint.
Our contributions are as follows. First, we present a de
nition of neutrality in recommendation based on the
consideration of why it is impossible to achieve an absolutely neutral
recommendation. Second, we propose a method to enhance
the neutrality that we de ned and combine it with a latent
factor recommendation model. Finally, we demonstrate that
the neutrality of recommendation can be enhanced and how
recommendation results change by enhancing the neutrality.
In section 2, we discuss the lter bubble problem and
neutrality in recommendation and de ne the goal of an
information neutral recommender task. An information neutral
recommender system is proposed in section 3, and its
experimental results are shown in section 4. Sections 5 and 6
cover related work and our conclusion, respectively.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. INFORMATION NEUTRALITY</title>
      <p>In this section, we discuss information neutrality in
recommendation based on the considerations on the lter bubble
problem and the ugly duckling theorem.</p>
    </sec>
    <sec id="sec-4">
      <title>2.1 The Filter Bubble Problem</title>
      <p>
        We here summarize the lter bubble problem posed by Pariser
and the discussion in the panel about this problem at the
RecSys 2011 conference. The lter bubble problem is a
concern that personalization technologies, including
recommender systems, narrow and bias the topics of information
provided to people while they don't notice these facts [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
Pariser demonstrated the following examples in a TED talk
about this problem [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. In a social network service,
Facebook1, users have to specify a group of friends with whom
they can chat or have private discussions. To help users
nd their friends, the service has a function to list other
users' accounts that are expected to be related to a user.
When Pariser started to use Facebook, the system showed
a recommendation list that consisted of both conservative
and progressive people. However, because he has more
frequently selected progressive people as friends, conservative
people have been excluded from his recommendation list
by a personalization functionality. Pariser claimed that the
system excluded conservative people without his permission
and that he lost the opportunity of getting a wide variety of
opinions.
      </p>
      <p>He furthermore demonstrated a collection of search results
from Google2 for the query \Egypt" during the Egyptian
uprising in 2011 from various people. Even though such a
highly important event was occurring, only sightseeing pages
were listed for some users instead of news pages about the
Egyptian uprising, due to the in uence of personalization.
In this example, he claimed that personalization technology
spoiled the opportunity to obtain information that should
be commonly shared in our society.</p>
      <p>We consider that Pariser's claims can be summarized as
follows. The rst point is the problem that users lost
opportunities to obtain information about a wide variety of topics.
A chance to know things that could make users' lives
fruitful was lessened. The second point is the problem that each
individual obtains information that is too personalized, and
thus the amount of shared information is decreased. Pariser
claimed that the loss of sharing information is a serious
obstacle for building consensus in our society. He claimed that
the loss of the ability to share information is a serious
obstacle for building consensus in our society.</p>
      <p>
        RecSys 2011, which is a conference on recommender systems,
held a panel discussion the topic of which was this lter
bubble problem [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. This panel concentrated on the following
three arguing points. (1) Are there lter bubbles? Resnick
pointed out the possibility that personalization technologies
narrow users' experience in the mid 1990s. Because
selecting speci c information by de nition leads to ignoring other
information, the diversity of users' experiences intrinsically
have a trade-o relation to the tness of information for
users' interests. As seen in the di erence between the
perspective of al-Jazeera and that of Fox News, this problem
exists irrespective of personalization. Further, given signals
or expressions of users' interest, it is di cult to adjust how
much a system should meet those interests.
(2) To what degree is personalized ltering a problem? There
is no absolutely neutral viewpoint. On the other hand, the
use of personalized ltering is inevitable, because it is not
feasible to exhaustively access the vast amount of
information in the universe. One potential concern is the e ect of
selective exposure, which is the tendency to get
reinforcement of what people already believe. According to the
results of studies about this concern, this is not so serious,
because people viewing extreme sites spend more time on
mainstream news as well.
(3) What should we as a community do to address the lter
bubble issue? To adjust the trade-o between diversity and
tness of information, a system should consider users'
immediate needs as well as their long-term needs. Instead of
selecting individual items separately, a recommendation list
or portfolio should be optimized as a whole.
      </p>
    </sec>
    <sec id="sec-5">
      <title>2.2 The Neutrality in Recommendation</title>
      <p>The absence of the absolutely neutral viewpoint is pointed
out in the above panel. We here more formally discuss this
point based on the ugly duckling theorem.</p>
      <p>
        The ugly duckling theorem is a classical theorem in pattern
recognition literature that asserts the impossibility of
classication without weighing certain features or aspects of
objects against the others [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. Consider a case that n
ducklings are represented by at least log2 n binary features, for
example, black feathers or a fat body, and are classi ed into
positive or negative classes based on these features. If the
positive class is represented by Boolean functions of binary
features, it is easy to prove that the number of possible
functions that classify an arbitrary pair of ducklings into a
positive class is 2n 2, even if choosing any pairs of ducklings.
Provided that the similarity between a pair of ducklings is
measured by the number of functions that classify them into
the same class, the similarity between an ugly duckling and
an arbitrary normal duckling is equal to the similarity
between any pair of ducklings. In other words, an ugly duckling
looks like a normal duckling.
      </p>
      <p>Why is an ugly duckling ugly? As described above, an ugly
duckling is as ugly as a normal duckling, if all features and
functions are treated equally. The attention to an arbitrary
feature such as black feathers makes an ugly duckling ugly.
When we classify something, we of necessity pay attention
to certain features, aspects, or viewpoints of classi ed
objects. Because recommendation is considered as a task to
classify items into a relevant class or an irrelevant one,
certain features or viewpoints must be inevitably weighed when
making recommendation. Consequently, the absolutely
neutral recommendation is intrinsically impossible.</p>
      <p>We propose a neutral recommendation task other than the
absolutely neutral recommendation. Recalling the ugly
duckling theorem, we must focus on certain features or
viewpoints in classi cation. This fact indicates that it is
feasible to make a recommendation that is neutral from a
speci c viewpoint instead of all viewpoints. We hence advocate
an Information Neutral Recommender System that enhances
the neutrality in recommendation from the viewpoint
specied by a user. In the case of Pariser's Facebook example, a
system enhances the neutrality so that recommended friends
are conservative or progressive, but the system is allowed to
make biased decisions in terms of the other viewpoints, for
example, the birthplace or age of friends.</p>
    </sec>
    <sec id="sec-6">
      <title>3. AN INFORMATION NEUTRAL</title>
    </sec>
    <sec id="sec-7">
      <title>RECOMMENDER SYSTEM</title>
      <p>In this section, we formalize a task of information neutral
recommendation and show a solution algorithm for this task.</p>
    </sec>
    <sec id="sec-8">
      <title>3.1 Task Formalization</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], recommendation tasks are classi ed into
Recommending Good Items that meet a user's interest, Optimizing
Utility of users, and Predicting Ratings of items for a user.
Among these tasks, we here concentrate on the task of
predicting ratings.
      </p>
      <p>We formalize an information neutral variant of a predicting
ratings task. x 2 f1; : : : ; ng and y 2 f1; : : : ; mg denote a
user and an item, respectively. An event (x; y) is a pair of
a speci c user x and a speci c item y. Here, s denotes a
rating value of y as given by x. We here assume that the
domain of ratings is real values, though domain of ratings
is commonly a set of discrete values, e.g., f1; : : : ; 5g. These
variables are common for an original predicting ratings task.</p>
      <p>To treat the information neutrality in recommendation, we
additionally introduce a viewpoint variable, v, which
indicates a viewpoint neutrality from which is enhanced. This
variable is speci ed by a user, and its value depends on an
event. Possible examples of a viewpoint variable are a user's
gender, which depends on a user part of an event, movie's
release year, which depends on an item's part of an event,
and a timestamp when a user rates an item, which depends
on both elements in an event. In this paper, we restrict the
domain of a viewpoint variable to a binary type, 0; 1, but it
is easy to extend to a multinomial case. An example
consists of an event, (x; y), a rating value for the event, s, and
a viewpoint value for the event, v. A training set is a set of
N examples, D = f(xi; yi; si; vi)g; i = 1; : : : ; N .</p>
      <p>Given a new event, (x; y), and its corresponding viewpoint
value, v, a rating prediction function, s^(x; y; v), predicts a
rating value of an item y by a user x. While this rating
prediction function is estimated in our task setting, a loss
function, loss(s ; s^), and a neutrality function, neutral(s^; v),
are given as task inputs. A loss function represents the
dissimilarity between a true rating value, s , and a predicted
rating value, s^. A neutrality function quanti es the degree
of the neutrality of a rating value from a viewpoint expressed
by a viewpoint variable. Given a training set, D, a goal of
an information neutral recommendation (predicting rating
case) is to acquire a rating prediction function, s^(x; y; t), so
that the expected value of a loss function is as small as
possible and the expected value of a neutral function is as large
as possible over (x; y; v). We formulate this goal by nding a
rating prediction function, s^, so as to minimize the following
objective function:
loss(s ; s^(x; y; v))
neutral(s^(x; y; v); v);
(1)
where &gt; 0 is a parameter to balance between the loss and
the neutrality.</p>
    </sec>
    <sec id="sec-9">
      <title>3.2 A Prediction Model</title>
      <p>
        In this paper, we adopt a latent factor model for predicting
ratings. This latent factor model, which is a kind of a matrix
decomposition model, is de ned as equation (3) in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], as
follows:
s^(x; y) =
+ bx + cy + pxqy&gt;;
(2)
where , bx, and cy are global, per user, and per item bias
parameters, respectively, and px and qy are K-dimensional
parameter vectors, which represent the cross e ects between
users and items. We adopt a squared loss as a loss function.
As a result, parameters of a rating prediction function can
be estimated by minimizing the following objective function:
(si
s^(xi; yi))2 +
      </p>
      <p>R;
(3)</p>
      <p>X
(xi;yi;si)2D
where R represents an L2 regularizer for parameters bx, cy,
px, and qy, and is a regularization parameter. Once we
learned the parameters of a rating prediction function, we
can predict a rating value for any event by applying
equation (2).</p>
      <p>
        We then extend this model to enhance the information
neutrality. First, we modify the model of equation (2) so as to
depend on the value of a viewpoint variable, v. For each
value of v, 0 and 1, we prepare a parameter set, (v), b(xv),
c(yv), p(xv), and q(yv). One of parameter sets is chosen
according as a value of v, and we get a rating prediction function:
s^(x; y; v) =
(v) + b(xv) + c(yv) + p(xv)q(yv)&gt;:
(4)
We next de ne a neutrality function to quantify the degree
of the information neutrality from a viewpoint variable, v.
In this paper, we borrow an idea from [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and quantify the
degree of the information neutrality by negative mutual
information under the assumption that neutrality is regarded
as statistical independence. A neutrality function is de ned
as:
      </p>
      <p>I(s^; v) =</p>
      <p>Pr[s^; v] log</p>
      <p>X Z
v2f0;1g
v2f0;1g
= X</p>
      <p>Pr[v]</p>
      <p>Z</p>
      <p>Pr[s^jv]
Pr[s^]</p>
      <p>ds^
Pr[s^jv] log</p>
      <p>Pr[s^jv]
Pr[s^]
ds^:
(5)
The marginalization over v is then replaced with the sample
mean over a training set, D, and we get
Note that Pr[s^] can be calculated by Pv Pr[s^jv] Pr[v], and
we use a sample mass function as Pr[v].</p>
      <p>Now, all that we have to do is to compute distribution
Pr[s^jv], but this computation is di cult. This is because
a value of a function s^ is not probabilistic but rather
deterministic depending on x, y, and v; and thus distribution
Pr[s^jx; y; v] has a form of collection of Dirac's delta
functions, (s^(x; y; v)). Pr[s^jv] can be obtained by marginalizing
this distribution over x and y. As a result, Pr[s^jv] also
becomes a hyper function like Pr[s^jx; y; v], and it is not easy to
manipulate. We therefore introduce a histogram model to
represent Pr[s^jv]. Values of predicted ratings, s^, are divided
into bins, because sample ratings are generally discrete. The
distribution P~r[s^jv] is expressed by a histogram model. By
replacing Pr[s^jv] with P~r[s^jv], equation (6) becomes
(v)2D s^2Bin
1 X
N</p>
      <p>X P~r[s^jv] log P~r[s^jv] ;</p>
      <p>P~r[s^]
(6)
(7)
where Bin denotes a set of bins of a histogram. Note that
because a distribution function, Pr[s^jv], is replaced with a
probability mass function, P~r[s^jv], an integration over s^ is
replaced with the summation over bins.</p>
      <p>L(D) =
By substituting equations (4) and (7) into equation (1) and
adopting a squared loss function as in the original latent
factor case, we obtain an objective function of an information
neutral recommendation model:</p>
      <p>X</p>
      <p>(si s^(xi; yi; vi))2+ I(s^; v)+ R; (8)
(xi;yi;si;vi)2D
where a regularization term, R, is a sum of L2 regularizers
of parameter sets for each value of v. Model parameters,
f (v); b(xv); c(yv); p(xv); q(yv)g; v 2 f0; 1g, are estimated so as to
minimize this objective function. However, it is very di
cult to derive an analytical form of gradients of this objective
function, because the histogram transformation used for
expressing Pr[s^jv] is too complicated. We therefore adopt the</p>
      <p>Powell optimization method, because it can be applied
without computing gradients.</p>
    </sec>
    <sec id="sec-10">
      <title>4. EXPERIMENTS</title>
      <p>We implemented our information neutral recommender
system in the previous section and applied it to a benchmark
data set.</p>
    </sec>
    <sec id="sec-11">
      <title>4.1 A Data Set</title>
      <p>
        We used a Movielens 100k data set [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] in our experiments.
As described in section 3.2, we adopted the Powell method
for optimizing an objective function. Unfortunately, this
method is too slow to apply to a large data set, because the
number of evaluation times of an objective function becomes
very large to avoid the computation of gradients. Therefore,
we shrank the Movielens data set by extracting events whose
user ID and item ID were less than or equal to 200 and 300,
respectively. This shrunken data set contained 9,409 events,
200 users, and 300 items.
      </p>
      <p>
        We tested the following two types of viewpoint variable. The
rst type of variable, Year, represents whether a movie's
release year is newer than 1990, which depends on an item
part of an event. In [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], Koren reported that the older
movies have a tendency to be rated higher, perhaps because
only masterpieces have survived. When adopting Year as a
viewpoint variable, our recommender enhances the
neutrality from this masterpiece bias. The second type of variable,
Gender, represents the user's gender, which depends on the
user part of an event. The movie rating would depend on the
user's gender, and our recommender enhances the neutrality
from this factor.
      </p>
    </sec>
    <sec id="sec-12">
      <title>4.2 Experimental Conditions</title>
      <p>
        We used the implementation of the Powell method in the
SciPy package [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] as an optimizer for an objective
function (8). To initialize parameter, events in a training set, D,
were rst divided into two sets according to their viewpoint
values. For each value of a viewpoint variable, parameters
are initialized by minimizing an objective function of an
original latent factor model (equation (3)). For the convenience
in implementation, a loss term of an objective was scaled by
dividing it by the number of training examples, and an L2
regularizer was scaled by dividing it by the number of
parameters. We use a regularization parameter = 0:01 and
the number of latent factors, K = 1, which are the lengths
of vectors p(v) or q(v). Because the original rating values are
1; 2; : : : ; 5, we adopted ve bins whose centers are 1; 2; : : : ; 5,
in equation (7). We performed a ve-fold cross-validation
procedure to obtain evaluation indices of the prediction
accuracy and the neutrality from a viewpoint variable.
      </p>
    </sec>
    <sec id="sec-13">
      <title>4.3 Experimental Results</title>
      <p>
        Experimental results are shown in Figure 1. Figure 1(a)
shows the changes of prediction errors measured by a mean
absolute error (MAE) index. The smaller value of this index
indicates better prediction accuracy. Figure 1(b) shows the
changes of the mutual information between predicted
ratings and viewpoint values. The smaller mutual information
indicates a higher level of neutrality. Mutual information is
normalized into the range [0; 1] by the method of employing
the geometrical mean in [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Note that distribution Pr[s^jv]
(b) normalized mutual information (NMI)
NOTE : Figure 1(a) shows the changes of prediction errors measured by a mean absolute error (MAE) index. The smaller
value of this index indicates better prediction accuracy. Figure 1(b) shows the changes of the mutual information between
predicted ratings and viewpoint values. The smaller mutual information indicates a higher level of neutrality. The X-axes
of these gures represent parameter values of . Dashed lines and dotted lines show the results using Year and Gender as
viewpoint variables, respectively.
is required to compute this mutual information, and we used
the same histogram model as in equation (7). The X-axes of
these gures represent parameter values of , which balance
the prediction accuracy and the neutrality. These
parameters were changed from 0, at which the neutrality term
was completely ignored, to 100, at which the neutrality was
highly emphasized. Dashed lines and dotted lines show the
results using Year and Gender as viewpoint variables,
respectively.
      </p>
      <p>MAE was 0.90, when o ering a mean score, 3:74, for all
users and all items. In Figure 1(a), MAEs were better than
this baseline, which is perfectly neutral from all viewpoints.</p>
      <p>Furthermore, the increase of MAEs as the neutrality
parameter, , was not so serious. Turning to the Figure 1(b), this
demonstrates that the neutrality is enhanced as the
neutrality parameter, , increases from both viewpoints, Year and
Gender. By drawing attention to the fact that the Y-axis is
logarithmic, we can conclude that an information
neutrality term is highly e ective. In summary, our information
neutral recommender system successfully enhanced the
neutrality without seriously sacri cing the prediction accuracy.</p>
      <p>Figure 2 shows the changes of mean predicted scores. In
both gures, the X-axes represent parameter values of ,
and the Y-axes represent mean predicted scores for each
case of using di erent viewpoint value. Figure 1(a) shows
mean predicted scores when a viewpoint variable is Year.</p>
      <p>Dashed and dotted lines show the results under the
condition a viewpoint variable is \before 1990" and \after 1991",
respectively. Figure 1(b) shows mean predicted scores when
a viewpoint variable is Gender. Dashed and dotted lines
show the results obtained by setting a viewpoint to \male"
and \female", respectively.</p>
      <p>We rst discuss a case that a viewpoint variable is Year.
According to Figure 1(b), neutrality was drastically improved
in the interval that is between 0 and 10. By observing
the corresponding interval in Figure 2, two lines that were
obtained for di erent viewpoints became close each other.</p>
      <p>This means that prediction scores become less a ected by
a viewpoint value, and this corresponds the improvement
of neutrality. After this range, the decrease of NMI became
smaller in Figure 1(b), and the lines in the corresponding
interval in Figure 2 were nearly parallel. This indicated that
the di erence between two score sequences less changes, and
the improvement in neutrality did too. We move on to a
Gender case. By comparing the changes of NMI between Year
and Gender cases in Figure 1(b), the decrease of NMI in a
Gender case was much smaller than that of a Year case. This
phenomenon could be con rmed by the fact that two lines
were nearly parallel in Figure 2(b). This is probably because
the score di erences in a Gender case are much smaller than
those in a Year at the point = 0, and there is less margin
for improvement. Further investigation will be required in
this point.</p>
    </sec>
    <sec id="sec-14">
      <title>5. RELATED WORK</title>
      <p>
        To enhance the neutrality, we borrowed an idea from our
previous work [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], which is an analysis technique for
fairness/discrimination-aware mining.
Fairness/discriminationaware mining is a general term for mining techniques
designed so that sensitive information does not in uence
mining results. In [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], Pedreschi et al. rst advocated such
mining techniques, which emphasized the unfairness in
association rules whose consequents include serious
determinations. Like this work, a few techniques for detecting unfair
treatments in mining results have been proposed [
        <xref ref-type="bibr" rid="ref14 ref25">14, 25</xref>
        ].
      </p>
      <p>These techniques might be useful for detecting biases in
recommendation.</p>
      <p>
        Another type of fairness-aware mining technique focuses on
classi cation designed so that the in uence of sensitive
information to classi cation results is reduced [
        <xref ref-type="bibr" rid="ref10 ref11 ref3">11, 3, 10</xref>
        ] These
male
female
3.75
3.70
10
20
30
40
50
60
70
80
NOTE : In both gures, the X-axes represent parameter values of , and the Y-axes represent mean predicted scores for
each case of di erent viewpoint value. Figure 1(a) shows mean predicted scores when a viewpoint variable is Year. Dashed
and dotted lines show the results under the condition a viewpoint variable is \before 1990" and \after 1991", respectively.
Figure 1(b) shows mean predicted scores when a viewpoint variable is Gender. Dashed and dotted lines show the results
obtained by setting a viewpoint to \male" and \female", respectively.
techniques would be directly useful in the development of an
information neutral variant of content-based recommender
systems, because content-based recommenders can be
implemented by adopting classi ers.
      </p>
      <p>
        Information neutrality can be considered as diversity in
recommendation in a broad sense. McNee et al. pointed out
the importance of factors other than prediction accuracy,
including diversity, in recommendation [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Topic
diversi
      </p>
      <p>
        cation is a technique for enhancing the diversity in a
recommendation list [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. Smyth et al. proposed a method for
changing the diversity in a recommendation list based on a
user's feedback [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
      <p>
        There are several reports about the in uence of
recommendations on the diversity of items accepted by users. Celma
et al. reported that recommender systems have a
popularity bias such that popular items have a tendency to be
recommended more and more frequently [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Fleder et al.
investigated the relation between recommender systems and
their impact on sales diversity by simulation [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Levy et al.
reported that sales diversity could be slightly enriched by
recommending very unpopular items [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Because information neutral recommenders can be used to
avoid the exploitation of private information, these
techniques are related to privacy-preserving data mining [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Independent component analysis might be used to maintain
the independence between viewpoint values and
recommendation results [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In a broad sense, information neutral
recommenders are a kind of cost-sensitive learning technique
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], because these recommenders are designed to take into
account the costs of enhancing the neutrality.
      </p>
    </sec>
    <sec id="sec-15">
      <title>6. CONCLUSION</title>
      <p>In this paper, we proposed an information neutral
recommender system that enhanced neutrality from a viewpoint
speci ed by a user. This system is useful for alleviating
the lter bubble problem, which is a concern that
personalization technologies narrow users' experience. We then
developed an information neutral recommendation algorithm
by introducing a regularization term that quanti es
neutrality by mutual information between a predicted rating and
a viewpoint variable expressing a user's viewpoint. We
nally demonstrated that neutrality could be enhanced
without sacri cing prediction accuracy by our algorithm.
The most serious issue of our current algorithm is
scalability. This is mainly due to the di culty in deriving the
analytical form of gradients of an objective function. We
plan to develop another objective function whose gradients
can be derived analytically. The degree of statistical
independence is currently quanti ed by mutual information. We
want to test other indexes, such as kurtosis, which are used
for independent component analysis. We will develop an
information neutral version of other recommendation models,
such as pLSI/LDA or nearest neighbor models.</p>
    </sec>
    <sec id="sec-16">
      <title>7. ACKNOWLEDGMENTS</title>
      <p>We would like to thank for providing a data set for the
Grouplens research lab. This work is supported by MEXT/JSPS
KAKENHI Grant Number 16700157, 21500154, 22500142,
23240043, and 24500194, and JST PRESTO 09152492.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          and P. S. Yu, editors.
          <source>Privacy-Preserving Data Mining: Models and Algorithms</source>
          . Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. Ben</given-names>
            <surname>Schafer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Konstan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Riedl</surname>
          </string-name>
          .
          <article-title>E-commerce recommendation applications</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          ,
          <volume>5</volume>
          :
          <fpage>115</fpage>
          {
          <fpage>153</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Calders</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Verwer</surname>
          </string-name>
          .
          <article-title>Three naive bayes approaches for discrimination-free classi cation</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          ,
          <volume>21</volume>
          :
          <fpage>277</fpage>
          {
          <fpage>292</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>O.</given-names>
            <surname>Celma</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Cano</surname>
          </string-name>
          .
          <article-title>From hits to niches?: or how popular artists can bias music recommendation and discovery</article-title>
          .
          <source>In Proc. of the 2nd KDD Workshop on Large-Scale Recommender Systems and the Net ix Prize Competition</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Elkan</surname>
          </string-name>
          .
          <article-title>The foundations of cost-sensitive learning</article-title>
          .
          <source>In Proc. of the 17th Int'l Joint Conf. on Arti cial Intelligence</source>
          , pages
          <fpage>973</fpage>
          {
          <fpage>978</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Fleder</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Hosanagar</surname>
          </string-name>
          .
          <article-title>Recommender systems and their impact on sales diversity</article-title>
          .
          <source>In ACM Conference on Electronic Commerce</source>
          , pages
          <volume>192</volume>
          {
          <fpage>199</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7] Grouplens research lab, university of minnesota. hhttp://www.grouplens.
          <source>org/i.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gunawardana</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Shani</surname>
          </string-name>
          .
          <article-title>A survey of accuracy evaluation metrics of recommendation tasks</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>10</volume>
          :
          <fpage>2935</fpage>
          {
          <fpage>2962</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hyva</surname>
          </string-name>
          <article-title>rinen</article-title>
          , J. Karhunen, and
          <string-name>
            <given-names>E. Oja. Independent Component</given-names>
            <surname>Analysis.</surname>
          </string-name>
          Wiley-Interscience,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Kamiran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Calders</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Pechenizkiy</surname>
          </string-name>
          .
          <article-title>Discrimination aware decision tree learning</article-title>
          .
          <source>In Proc. of the 10th IEEE Int'l Conf. on Data Mining</source>
          , pages
          <volume>869</volume>
          {
          <fpage>874</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kamishima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Akaho</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Sakuma</surname>
          </string-name>
          .
          <article-title>Fairness-aware learning through regularization approach</article-title>
          .
          <source>In Proc. of The 3rd IEEE Int'l Workshop on Privacy Aspects of Data Mining</source>
          , pages
          <volume>643</volume>
          {
          <fpage>650</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Koren</surname>
          </string-name>
          .
          <article-title>Collaborative ltering with temporal dynamics</article-title>
          .
          <source>In Proc. of the 15th Int'l Conf. on Knowledge Discovery and Data Mining</source>
          , pages
          <volume>447</volume>
          {
          <fpage>455</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Levy</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Bosteels</surname>
          </string-name>
          .
          <article-title>Music recommendation and the long tail</article-title>
          .
          <source>In WOMRAD 2010: Recsys 2010 Workshop on Music Recommendation and Discovery</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B. T.</given-names>
            <surname>Luong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruggieri</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Turini.</surname>
          </string-name>
          k
          <article-title>-NN as an implementation of situation testing for discrimination discovery and prevention</article-title>
          .
          <source>In Proc. of the 17th Int'l Conf. on Knowledge Discovery and Data Mining</source>
          , pages
          <volume>502</volume>
          {
          <fpage>510</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>McNee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Riedl</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Konstan</surname>
          </string-name>
          .
          <article-title>Accurate is not always good: How accuracy metrics have hurt recommender systems</article-title>
          .
          <source>In Proc. of the SIGCHI Conf. on Human Factors in Computing Systems</source>
          , pages
          <fpage>1097</fpage>
          {
          <fpage>1101</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>E. Pariser.</surname>
          </string-name>
          <article-title>The lter bubble</article-title>
          . hhttp://www.thefilterbubble.
          <source>com/i.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>E. Pariser.</surname>
          </string-name>
          <article-title>The Filter Bubble: What The Internet Is Hiding From You</article-title>
          . Viking,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D.</given-names>
            <surname>Pedreschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruggieri</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Turini</surname>
          </string-name>
          .
          <article-title>Discrimination-aware data mining</article-title>
          .
          <source>In Proc. of the 14th Int'l Conf. on Knowledge Discovery and Data Mining</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>P.</given-names>
            <surname>Resnick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Iacovou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Suchak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bergstrom</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Riedl.</surname>
          </string-name>
          <article-title>GroupLens: An open architecture for collaborative ltering of Netnews</article-title>
          .
          <source>In Proc. of the Conf. on Computer Supported Cooperative Work</source>
          , pages
          <volume>175</volume>
          {
          <fpage>186</fpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>P.</given-names>
            <surname>Resnick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Konstan</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. Jameson.</surname>
          </string-name>
          <article-title>Panel on the lter bubble</article-title>
          .
          <source>The 5th ACM conference on Recommender systems</source>
          ,
          <year>2011</year>
          . hhttp://acmrecsys.wordpress.com/
          <year>2011</year>
          /10/25/ panel-on
          <article-title>-the-filter-bubble/i.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>P.</given-names>
            <surname>Resnick</surname>
          </string-name>
          and
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Varian</surname>
          </string-name>
          .
          <article-title>Recommender systems</article-title>
          .
          <source>Communications of The ACM</source>
          ,
          <volume>40</volume>
          (
          <issue>3</issue>
          ):
          <volume>56</volume>
          {
          <fpage>58</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Scipy</surname>
          </string-name>
          .org. hhttp://www.scipy.
          <source>org/i.</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>B.</given-names>
            <surname>Smyth</surname>
          </string-name>
          and
          <string-name>
            <surname>L. McGinty.</surname>
          </string-name>
          <article-title>The power of suggestion</article-title>
          .
          <source>In Proc. of the 18th Int'l Joint Conf. on Arti cial Intelligence</source>
          , pages
          <fpage>127</fpage>
          {
          <fpage>132</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Strehl</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          .
          <article-title>Cluster ensembles | a knowledge reuse framework for combining multiple partitions</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>3</volume>
          :
          <fpage>583</fpage>
          {
          <fpage>617</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>I. Zliobaite_</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kamiran</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Calders</surname>
          </string-name>
          .
          <article-title>Handling conditional discrimination</article-title>
          .
          <source>In Proc. of the 11th IEEE Int'l Conf. on Data Mining</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>S.</given-names>
            <surname>Watanabe</surname>
          </string-name>
          .
          <article-title>Knowing and Guessing { Quantitative Study of Inference and Information</article-title>
          . John Wiley &amp; Sons,
          <year>1969</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>C.-N. Ziegler</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>McNee</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Konstan</surname>
            , and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Lausen</surname>
          </string-name>
          .
          <article-title>Improving recommendation lists through topic diversi cation</article-title>
          .
          <source>In Proc. of the 14th Int'l Conf. on World Wide Web</source>
          , pages
          <volume>22</volume>
          {
          <fpage>32</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>