=Paper=
{{Paper
|id=None
|storemode=property
|title=Improving Novelty in Streaming Recommendation Using a Context Model
|pdfUrl=https://ceur-ws.org/Vol-889/paper3.pdf
|volume=Vol-889
}}
==Improving Novelty in Streaming Recommendation Using a Context Model==
Improving Novelty in Streaming Recommendation Using a
Context Model ∗
Doina Alexandra Dumitrescu Simone Santini
Escuela Politécnica Superior
Universidad Autónoma de Madrid
Madrid, Spain
{doina.dumitrescu, simone.santini}@uam.es
ABSTRACT doesn’t know whether the user wants to know something
In recent years there has been an increasing research interest about the neighborood in New York, about the cocktail, or
in novelty/diversity detection in Information Retrieval and about the Indian tribe: the query has several possible inter-
in Recommendation Systems. We propose a model that in- pretation, and whoever formulated the query is interested in
creases the novelty of recommendations using a context user general in only one of them. Without further specification,
profile that was created automatically using self-organizing the safest bet for the server is to provide a diverse result
maps. Our system was evaluated on the Reuters Corpus set, that is, one that covers all these topics, possibly in an
Volume 1 and our experiments show that filtering the recom- amount proportional to the estimated a priori user interest
mended items using a novelty score derived from the context- (one can assume, for instance, that more people are inter-
based user profile provides better search results in terms of ested in New York than in the Indian tribe). Even if the
novel information that is presented to the user. query is not ambiguous and the results are about what the
user really meant (say, the neighborhood in new york), the
Categories and Subject Descriptors user would want every document to be novel, that is, to
provide information that is not present in other documents.
H.3.3 [Information Systems]: Information Search and Re-
A highly informative text about the history and the human
trieval
landscape of Manhattan is probably a very good first result,
but another document almost identical to the first is not an
General Terms equally desirable second result: the information that it con-
Algorithms, Metrics, Experimentation tains has already been seen. That is, every interpretation
of a query contains several aspects, and the person who for-
Keywords mulated it will be interested, to some degree, in all of them.
Novelty, Diversity, User Model, Context These considerations highlight an important difference be-
tween diversity and novelty. Diversity is something that the
user wouldn’t really want: the person interested in the man-
1. INTRODUCTION hattan cocktail would be very happy to receive results only
The plethora of information available on the internet about about the cocktail. Diversity is needed mainly by the server,
virtually anything, and the duplication of this information which doesn’t know enough about the user true intentions.
due to the significant replication of the sources has given a Novelty, on the other hand, is something the user wants,
great relevance to the issues of novelty and diversity in re- since she will be interested in all the aspects of her query.
trieval and recommendation systems [2, 5]. The definition Novelty is a way to avoid redundancy, while diversity is a
of novelty and diversity generally accepted in information way to deal with the server’s ignorance.
retrieval is that diversity helps the server deal with query
ambiguity, while novelty helps the user deal with query un- If we have access to a model of the user’s interests, then it
derspecification [8]. Consider a query composed of the word is possible to reduce ambiguity and, therefore, to reduce the
manhattan. The query is ambiguous because the server need for diversity, concentrating only on increasing novelty.
∗This work was supported by the Ministerio de Educación y We present a method for increasing novelty that relies on
Ciencia under the grant N. TIN2011-28538-C02, Novelty, di- a context model built on texts previously seen by the user.
versity, context and time: newdimensions in next-generation The scenario that we use here to exemplify the working of
information retrieval and recommender systems. our method is that of the recommendation of news, and the
context is composed of past news that the user has con-
sidered interesting. This is just one possible way in which
context can be gathered. Our model is very general and
can be used to represent text or multimedia data. In past
work, we have used, for example, the contents of the user’s
hard disk to represent his main interests [7]. In this paper
we show and evaluate a modification to our model with the
CARS-2012, September 9, 2012, Dublin, Ireland. purpose of increasing the context coverage of the recommen-
Copyright is held by the author/owner(s). dations, that is, the fraction of the user context that the
- filter - marker In addition to this short term change, the context evolves in
a permanent and irreversible way, again under the influence
items of the documents read (as well as, possibly, based on other
short time elements such as the documents that the user creates on
6 update irreversible her computer, or the emails that she writes). A number
context update of strategies can be used for this evolution: the user can
model be asked to mark certain documents of special interest, the
system can present only a summary, and all the documents
whose complete text is accessed are permanently entered
Figure 1: Schema of out stream recommendation into the context, etc. We have considered the evolution of
system. the context based on documents of interest elsewhere [7],
and we shall not consider it here. From the point of view of
this paper, therefore, the context, defined as in the following
recommended items will cover, trying to avoid items that section, is fixed, and only the short term changes, needed to
refer to parts of the context already covered by previously increase novelty, are considered.
seen items.
3. THE USER MODEL
There is convincing evidence that a user model can help
2. STREAM RECOMMENDATION attain a higher precision in information retrieval and recom-
One important distinction between this work and others mendation systems [1, 6]. In this paper, we are interested
on novelty and diversity in recommendation is that we are in using such a model to achieve novelty, that is, to obtain
dealing with a stream of news, that is, with a continuous– recommendations that span the whole gamut of the user’s
potentially infinite–interaction, while typical recommenda- interests. The context model that we use was presented
tion systems work based on sessions: the user chooses an (without the extension to enforce novelty that we present
item or requests a recommendation, and a finite set of results here0 in [7], to which the reader is referred for details; here
is shown. This operation may be repeated several times, as we shall only give a brief description. The basis of the model
the user selects several items and receives correspondingly is a set of documents, which can be composed of the docu-
changing recommendations. No work, to the best of our ments on which the user is working at the time or, in our
knowledge, has considered novelty in a session-less situa- case, by news items marked in the past. The model doesn’t
tion, in which a countable stream of items arrive and a sys- change depending on the source of the documents. In the
tem has to decide whether they are relevant for the user at following, in order to make the tests self-contained, we shall
that moment. Note that, differently from the case of a closed always assume that the context is based on a set of news
session, we don’t have access to the whole data base of items items that supposedly the user has seen in the past and
that can be recommended. We don’t even have access to the that he has found relevant.
items that have already been shown, since in a stream their
number is unbound. All we can do, at any given time, is take All the documents in the contexts are considered as a sin-
a decision only considering how the past history of items has gle “macro-document” and processed consequently. On this
modified the user context. The system is shown schemati- macro-document we perform stop-word removal and stem-
cally in figure 1. We have a stream of items (news, in our test ming. Each term t is assigned a weight wt using tf-idf weight-
scenario) that arrive continuously to the user. The user is ing [1]. We use the standard vector model representation
described by the context model C (the details of this repre- of information retrieval in which each different term corre-
sentation will be given in the next section) and each news is sponds to an axis in the word space. We consider n-grams,
compared with the context and assigned a score. Whenever constituted of n consecutive terms that appear in the docu-
the score is above a certain threshold (or a predefined crite- ment. If the n-gram pk is composed of the terms tu1 , . . . , tun ,
rion is met) the item is presented to the user. At the same with weights wu1 , . . . , wun , then its representation in the
time, the context undergoes a first kind of change, which word space is given by the point
we call a short term change. This change (used to improve
1
the novelty of the recommended items) is characterized by a pk = pPn (. . . , wu1 , . . . , ww2 , . . . , wun , . . .)
2
time constant β. After a time, depending on β, the change i=1 wui
provoked by any individual item disappears. This is one im- in the T -dimensional term space. Note that all points are
portant difference between our model and the session based normalized, so they are in reality points in the manifold
ones. Let us assume that a person is interested in politics. S T −1 (the unit sphere in RT ). A context is represented by
News about the election of the new French president will a set of these n-gram representations, what we call a point
undoubtedly be of interest to her. However, she will proba- cloud representation. That is, a context is a finite set of
bly not be interested in receiving many similar news about points C ⊆ S T −1 .
the election one after the other. The effect of the short term
change will be do “deactivate”, for a time, the part of the In the word space, we lay a self-organizing map, using a
context that deals with this kind of news, so that further modification of WEBSOM [3]. The map is a grid of elements
news about the French elections will not, cœteris paribus, be called neurons, each one of which is a point in the word space
considered as interesting. However, this situation should not and is identified by two integer indices, that is, a neuron is
be permanent: after a while (a few hours, a day,...), further given as:
news about the French President will again be considered
interesting, and should be recommended. [µν] = ([µν]1 , . . . , [µν]T )0 1 ≤ µ ≤ N, 1 ≤ ν ≤ M (1)
The map is discrete, two-dimensional1 with the 4-neighborhood context), and determining the maximal similarity between
topology. The neurons are immersed in the T -dimensional u and the neurons of the map
word space ([µν] ∈ RT ), and in our version of WEBSOM
their weights are normalized. That is, throughout the al- r = max s([µν], u) (7)
[µν]
gorithm we enforce Tt=1 [[]µν]2t = 1 for all map indices µν.
P
The neurons are at the same time elements of the discrete the items that yield a value of r below a certain threshold are
grid and points in S T −1 . As points in the discrete grid, the considered interesting and presented to the user. As we have
relevant distance between two neurons is their graph dis- mentioned in the introduction, the problem with this simple
tance: solution is that very similar news items can have high score
because they are similar to the same neuron (that is: to the
δ([ζξ], [µν]) = |ζ − µ| + |ξ − ν| (2) same part of the user context), although the presentation
As points in S T −1 , it is possible to determine the similarity of more than one adds little to the information content of
between a neuron and any other point p ∈ S T −1 as the first one. To increase the diversity of the results, we
rewrite (7) adding an interest factor wµν to each neuron,
T
X and computing the relevance of the news item u as
s([µν], w) = [µν]i pi (3)
i=1 r = max wµν s([µν], u) (8)
[µν]
On this map we define a neighborhood function, h(t, n), which
depends on two parameters t, n ∈ N; n is the graph dis- Let us call [∗] the “winning” neuron for a given item (the
tance between two given neurons, t is a time parameter that neuron closest to it, viz. the neuron for which the maximum
increases as learning proceeds. The function h(t, n) repre- of eq. (8) is attained). Whenever an item is recommended,
sents the “degree of neighborhood-ness” of two neurons at the interest factor of each neuron is updated according to
a distance n at time t; We assume that 0 ≤ h(t, n) ≤ 1, the equation
h(t, 0) = 1, and that h is monotonically decreasing in n and (
t. The degree to which neuron [ζξ] belongs to the neighbor- λwµν if [µν] = [∗]
wµν ← (9)
hood of neuron [µν] at time t is given by h(t, δ([ζξ], [µν])). min{1, (1 + β)wµν } otherwise
In addition to the neighborhood we define a positive learning
parameter α(t), t ∈ N, monotonically decreasing with t. with 0 < λ < 1 and β > 0. According to this equation, the
interest of the neuron closest to a recommended news item
In order to create the map for a context C, all the points in is reduced by a factor λ, reducing its contribution to (8) and
it are presented to the map, and the training algorithm is therefore making it less likely to win again. This means that
applied. We call the presentation of a point p ∈ C an event further news items close to the one that caused this neuron
of learning, and the presentation of all the points of C an to win will be less likely to be recommended. If the neuron
epoch. Learning consists of a number of epochs, counted by is no longer the winner for any item, its interest will be
a counter t. The neurons of the map are at first spread ran- increased by a factor (1 + β) with each recommended item,
domly in the word space; then, for each event consisting of until the value 1 is restored (or until the neuron wins again,
the presentation of the point p, the neuron with the highest in which case its interest factor will again be reduced). If
similarity to p is found: the neuron wins only once, its relaxation time (the number
of items necessary for it to go back to 1) is
[∗] = arg max s(p, [µν]); (4)
[µν] log λ
tr = − (10)
log(1 + β)
the neuron [∗] and all its neighbors are shifted towards p.
The amount of this shift depends on the learning parameter If we want to achieve effective diversity, this time should
α and on the distance from [∗] on the map: be of the order of magnitude of the number of neurons in
the network, but not greater, as if neurons win too often
∀[µν] [µν] ← [µν] + α(t)h(t, δ([∗], [µν])) · (p − [µν]) (5) their interest factor will decrease, on average, as a function
K
Finally, all neurons are re-normalized in order to maintain of time. A reasonable choice is to set tr ≈ 10 , where K is
them on the sphere S T −1 : the number of neurons. Expressing, say β as a function of
λ and tr , we obtain
[µν]
∀[µν] [µν] ← pP (6) 1
2
i [µν]i
1 tr
β= −1 (11)
λ
4. NOVELTY IN NEWS FILTERING so, for the desired recovery span, we have
If we don’t consider novelty, we can filter the news items as
they arrive simply by representing them as a point u ∈ S T −1
10
1 K
(considering it as a document and using the same informa- β= −1 (12)
λ
tion retrieval techniques that we have used to represent the
1
In this schema, we only reduce the interest factor of the
The map can be k dimensional; however, maps with k > 4 neuron [∗]. In our tests, we also tried another schema, in
are rarely used because of the combinatorial explosion in which we reduce (by a progressively smaller amount) the
the number of neuron which would lead to over-fitting most
training sets. We have used mostly maps with k = 2, and interest factor of neurons in the neighborhood of the winner
we shall consider only this case here, mainly because it avoid using the same function h that we have used in the training
notational complexities. algorithm. In this case, the update of the interest factor is
done with the function categories contain no document, and in our experiments we
( used only the 103 that do.
λh(0, [µν], [∗])wµν if δ([µν], [∗]) < τ
wµν ← (13)
min{1, (1 + β)wµν } otherwise
No IDF, (M11,M13,MCAT)
1000
where h is the neighborhood function, and τ is a suitable
threshold (in our experiments, we set τ = 2).
5. TESTS
We consider two measures. The first one is the standard 100
novelty:
precision (the fraction of recommended documents that are no update
winner only
coverage
relevant). Given a context and a set of recommendations, neighborhood
we expect that the use of novelty will somewhat decrease
precision (if there are two very relevant documents that are
10
almost identical, a system that considers novelty will rec-
ommend only one of them). So, this measure is a sort of
quality control: we do accept a certain reduction of preci-
sion, but we want to keep it under control, to avoid losing
too much result quality. The second measure is a deter-
1
mination of novelty. We measure, for a given number of 0 100 200 300 400 500
recommended items, the number of neurons that win. If D results
is the number of documents recommended and, during the
recommendation of these documents, nw different neurons Figure 3: Coverage vs. number of recommended
are winners (i.e. they are the neurons that achieve the max- items (simple weighting)
imum in eq. (8)), then the coverage is defined as V = nw /D
(0 < V ≤ 1). Note that nw < K so that if the documents We began by creating a context. To this end, we used
arrive in a stream (D → ∞) in the long run this value will 5% of the news items contained in the sub-categories (of
tend to zero. In order to obtain a significative value, we al- MCAT) M11 (EQUITY MARKETS) and M13 (MONEY
ways consider sets of recommended documents smaller than MARKETS), for a total of about 5000 documents. Once
the number of neurons in the network. the context is in place, we create a stream of news with all
the elements in the Reuters data base and use our method
No IDF, (M11,M13,MCAT)
for recommending items that are close to the context. As a
1
ground truth, we consider as relevant all the items that are
in the context categories (M11 and M13) and in their parent
novelty: category (MCAT), but not those that are in the sibling cate-
no update
0.8 winner only
gories of M11 and M13. We stop the stream when 500 items
neighborhood have ben recommended (this is necessary in order to get sig-
nificant coverage values, as we mentioned before). Each item
0.6 is analyzed and represented as a point u ∈ S T −1 (see sec-
precision
tion 4). The weights of the terms that compose each item
are determined according to two different schemas: Sim-
0.4 ple weighting (without an inverse document frequency (idf)
term; the weight of a word is calculated using the normalized
term frequency (tf ), and tf-idf (the normalized frequency of
0.2 a word is divided by the logarithm of its frequency in the
British National Corpus (BNC) word frequency list2 ). Fig-
ures 2–5 show the results of our tests. Each figure contains
0
0 100 200 300 400 500
three curves: the first is obtained by filtering the items with-
results
out taking into account novelty (having the interest factor of
each neuron always equal to 1). For the second, we used the
Figure 2: Precision vs. number of recommended adaptation of (7), with λ = 0.1 and a relaxation time long
items (simple weighting). enough so that once a neuron has won, its interest factor will
remain low for the duration of the test; the third is obtained
using the neighborhood adaptation (9), with τ = 2. The val-
To validate the proposed approach, we conducted some ex-
ues of λ and τ are not critical, and some preliminary tests
periments using 2.5 gigabytes of uncompressed news sto-
on limited data sets allowed us to select reasonable values.
ries from Reuters Corpus Volume 1 (RCV1-v1) [4]. The
Figures 2 and 3 show the precision and the coverage V for
Reuters’s is a collection of 806,791 news stories, each one as-
filtering done with simple weighting (without an idf term).
sociated with some metadata. In our experiments, we used
The precision drops only marginally when considering nov-
the text and the topics categories each news document be-
elty, but the number of neurons that “win” is increased by
longs to. The RCV1-v1 test collection contains 117 topics
more than an order of magnitude, confirming that the re-
in a three-level hierarchy with, at the top, four categories:
sults are relative to a broader portion of the user context.
CCAT (Corporate/Industrial), ECAT (Economics), GCAT
2
(Government/Social), MCAT (Market). Fourteen of these Available at http://www.natcorp.ox.ac.uk/
Figures 4 and 5 show the same results using a weighting of the use of the method for increasing novelty. This might
scheme with the idf term for the news items. The results also explain why in some cases the novelty system attains a
are qualitatively the same, although we observe a general higher precision than the one based only on relevance: there
drop in precision. We have not investigated the causes of are probably parts of the context more ambiguous than oth-
this drop, which we suspect are related to the small number ers sine they are sensitive to words that appear in categories
of words in the items. Note that in the range 200-500 results, other than MCAT. With the novelty system, once an item
the system with novelty actually achieves a better precision has activated these part, they become “desensitized,” and
that the simple system. We haven’t thoroughly investigated cease to produce irrelevant results.
this phenomenon, but it is possible that the simple system
get “stuck” in groups of similar documents from other cate- 6. CONCLUSION
gories with a similar (and relatively high) value of measred In this paper, we have presented a method for increasing the
relevance. Introducing novelty in this case would reduce the novelty of a recommendation system that receives a stream,
presence of these groups. as opposed to the session-based recommender systems com-
mon in the literature. We have shown that a model of the
IDF, (M11,M13,MCAT) user context can be useful to increase the novelty of the
1 recommended results without having to receive results irrel-
evant for the user. The fact that we receive a potentially
novelty: infinite stream of items poses new challenges for novelty in-
no update
0.8 winner only creasing systems, as redundancy, which is taken to be a con-
neighborhood
stant in session based system, “fades away” in the case of
stream: a result very similar to one already received will be
0.6 considered redundant in the short term, but may become in-
precision
teresting again in the long term. To deal with this problem,
we have introduced an interest factor in our model that is
0.4
reduced when a certain area of the context is activated but
that, if the area is no longer activated, recovers in a time
that can be controlled by design.
0.2
7. REFERENCES
0 [1] J. e. a. Allan. Challenges in information retrieval and
0 100 200 300 400 500
language modeling: report of a workshop held at the
results
center for intelligent information retrieval. SIGIR
Forum, 37(1):31–47, Apr. 2003.
Figure 4: Precision vs. number of recommended
items (tf-idf weighting) [2] E. Gabrilovich, S. Dumais, and E. Horvitz. Newsjunkie:
providing personalized newsfeeds via analysis of
information novelty. In WWW ’04: Proceedings of the
IDF, (M11,M13,MCAT) 13th international conference on World Wide Web,
1000 pages 482–490, New York, NY, USA, 2004. ACM.
[3] S. Kaski. Computationally efficient approximation of a
probabilistic model for document representation in the
WEBSOM full-text analysis method. Neural Processing
letters, 5(2), 1997.
100
[4] D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. Rcv1: A
new benchmark collection for text categorization
coverage
research. J. Mach. Learn. Res., 5:361–397, Dec. 2004.
[5] X. Li and B. W. Croft. An information-pattern-based
10 approach to novelty detection. Information Processing
novelty:
no update & Management, 44(3):1159–1188, May 2008.
winner only
neighborhood
[6] G. Pasi. Contextual search: Issues and challenges. In
A. Holzinger and K.-M. Simonic, editors, Information
Quality in e-Health, volume 7058 of Lecture Notes in
1 Computer Science, pages 23–30. Springer Berlin /
0 100 200 300 400 500
Heidelberg, 2011. 10.1007/978 − 3 − 642 − 25364 − 53 .
results
[7] S. Santini and A. Dumitrescu. Context as a
Figure 5: Coverage vs. number of recommended non-ontological determinant of semantics. In
items (tf-idf weighting) Proceedings of the 3rd International conference on
Semantics and digital media technologies, 2008.
We must also remark that the MCAT category is quite dif- [8] R. Song, Z. Luo, J.-Y. Nie, Y. Yu, and H.-W. Hon.
ficult to filter, as many of the significant terms that appear Identification of ambiguous queries in web search.
in it are part of the economics jargon, and appear in other Information Processing & Management, 45(2):216 –
categories as well, such as ECAT. This explains the rela- 229, 2009.
tively low values of precision that we obtain, independently