=Paper=
{{Paper
|id=None
|storemode=property
|title=Model adaptation with Bayesian hierarchical modeling for context-aware recommendation
|pdfUrl=https://ceur-ws.org/Vol-791/paper2.pdf
|volume=Vol-791
}}
==Model adaptation with Bayesian hierarchical modeling for context-aware recommendation==
Model Adaptation with Bayesian Hierarchical Modeling
for Context-Aware Recommendation
Hideki Asoh Yoichi Motomura Chihiro Ono
AIST AIST KDDI R & D Laboratories Inc.
AIST Tsukuba Central 2 AIST Tokyo Waterfront 2-1-15 Ohara, Fujimino
1-1-1 Umezono, Tsukuba 2-3-26 Aomi, Koutouku Saitama 365-8502 Japan
Ibaraki 305-8568 Japan Tokyo 135-0064 Japan ono@kddilabs.jp
h.asoh@aist.go.jp y.motomura@aist.go.jp
ABSTRACT in the specic contexts, is often conducted. Although there
Model adaptation is a process of modifying a model trained may be dierences between the preferences in the real con-
with a large amount of training data from the source do- texts and the supposed contexts, the dierences are not
main to adapt a specic similar target domain by using a taken seriously.
small amount of adaptation data regarding the target do- In our previous works, we collected users' preferences of
main. Bayesian hierarchical modeling is well known as a various dishes in both real and supposed contexts and showed
general tool for model adaptation and multi-task learning, that the dierence is statistically signicant and not negli-
and widely used in various areas such as marketing, ecol- gible [17]. We also analyzed the statistical nature of the dif-
ogy, medicine, education, and so on in order to model the ferences and demonstrated that the structure of preferences
heterogeneity in the phenomena. In this work, we propose in supposed contexts is simpler than that of the preferences
to apply the Bayesian hierarchical modeling to the problem in real contexts [3]. These studies suggested that it is dan-
of preference modeling, where a model trained with a large gerous to construct preference models using data collected
amount of supposed context data is adapted to the real con- only in supposed contexts.
text by using additional small amount of real context data. In this work, we pursue the possibility to construct bet-
The eectiveness of the proposed method is evaluated by ter preference models by combining data in the supposed
experiments using context-aware food preference data. contexts and the real contexts. Although there are dier-
ences between the preferences in the real contexts and in the
supposed contexts, they are similar in some extent, and the
Categories and Subject Descriptors cost to collect data in the supposed contexts is much cheaper
H.4 [Information Systems Applications]: Miscellaneous than in the real contexts. Hence, if we can modify a model
constructed by a large amount of supposed context data to
General Terms adapt to the real contexts by using small amount of real
context data, it helps much to realize better context-aware
Experimentation, Human factors, Measurement recommender systems with smaller cost.
This kind problems are known as "model adaptation",
Keywords "learning to learn", "transfer learning", or "multi-task learn-
Model Adaptation, Preference Modeling, Context Aware- ing" in the area of the statistical machine learning, and stud-
ness ied actively in recent years [6, 13, 15, 20]. In the area, the
methods to have good learning results (statistical models of
data) by combining data in dierent but similar domains.
1. INTRODUCTION Typical examples are acoustic model adaptation and lan-
Modeling users' preferences is an important element of guage model adaptation in speech recognition systems [11,
recommender systems. We have constructed several context- 9, 19]. The collaborative ltering can also be considered as
aware attribute-based recommender systems. The systems a case of multi-task learning [24].
use Bayesian networks for modeling users' preferences [2, There has been proposed several methods for model adap-
16]. In the course of the construction, collecting large amount tation. In this work, we will exploit the methods using
of data about users' preference through inquiries is neces- Bayesian hierarchical modeling [7, 8] because the simple and
sary. In particular, to make the model context-aware, users' natural nature of the method. We will construct a hierar-
preference data should be collected under various contexts. chical model for preference model adaptation by combining
However, putting subjects of inquiries into various contexts real and supposed context data, and evaluate the model us-
and collecting answers from them is often dicult and costs ing the food preference data.
much. Hence, collecting answers in supposed contexts, i.e. The rest of the paper is organized as follows. Section
contexts where the subjects pretend or image that they are 2 briey introduces the Bayesian hierarchical modeling and
formulates our model for model adaptation in context-aware
preference modeling. Section 3 describes experiments using
food preference data, and Section 4 is for conclusion and
CARS-2011, October 23, 2011, Chicago, Illinois, USA. future work.
Copyright is held by the author/owner(s).
2. BAYESIAN HIERARCHICAL MODELING 3. EXPERIMENTS
Bayesian hierarchical modeling is an eective method for We applied the proposed model to our context-aware food
simultaneous estimation of several parameters over similar preference data and evaluated the accuracy of predicted rat-
domains, and is used to capture heterogeneity of subjects in ings in the real contexts for unknown cases.
areas such as marketing and ecology [5, 12, 18].
We have already proposed to apply the following simple 3.1 Data acquisition and preparation
linear Gaussian hierarchical model to the problem of con- In our previous work [17], we designed an internet ques-
structing context-aware preference model which can model tionnaire survey in order to collect corresponding data, that
and predict ratings rucs by users u for items c in contexts s is, we asked subjects the same question about food prefer-
[4]. ence both in real and supposed contexts and collect pairs of
rucs ∼ normal(µucs , 1/τ ), answers. The target contents were typical dishes served in
µucs = µ0 + au + bc + cs , food courts.
The survey was composed of two questionnaire surveys.
τ ∼ gamma(ν, θ), The rst questionnaire survey was conducted from 16th to
µ0 ∼ normal(µ, σ 2 ), 17th in December 2008. The number of subjects was 746,
au ∼ normal(0, 1/τa ), each subject evaluated 5 kinds of a la carte dishes randomly
selected from 20 kinds of dishes such as "chicken steak",
bc ∼ normal(0, 1/τb ),
"beef steak", "beef curry", "pasta with cod roe", "Japanese
cs ∼ normal(0, 1/τc ), noodle", etc. using 5-grade rating scale from "I do not want
τa ∼ gamma(ν, θ), to order the dish at all" to "I want to order the dish very
τb ∼ gamma(ν, θ), much". At the same time the subjects answered the current
degree of hunger in 3 levels (hungry, normal, full).
τc ∼ gamma(ν, θ). After that, the subjects are asked to imagine that they
are in the dierent degree of hunger from the current, and
answered the preference for the same 5 dishes. In total,
Here, normal(µ, 1/τ ) means Gaussian distribution with mean
preferences for 5 dishes in three dierent contexts (degree of
µ and variance 1/τ , and gamma means Gamma distribution.
hunger) are collected. Among the three contexts, one is real
In this paper, we will extend the above model for model
and two are supposed.
adaptation by combining real and supposed context data as
The second survey was conducted in other days from 22nd
follows:
to 24th in December 2008. The all subjects who answered in
(r)
rucs ∼ normal(µ(r)
ucs , 1/τ ), the rst survey were imposed the same questions as the rst
(s)
rucs ∼ normal(µ(s) survey and we extracted subjects who answered dierent
ucs , 1/τ ),
(r)
degree of hunger from the rst survey. After ltering out
µ(r)
ucs = µ0 + au(r) + b(r) (r)
c + cs , unreliable subjects, the number of extracted subjects was
µ(s)
(s)
= ν0 + a(s) (s) (s) 212.
ucs u + b c + cs ,
By combining the result of two surveys, we got corre-
τ ∼ gamma(ν, θ), sponding preference for 5 dishes in 2 dierent degree of
(r)
µ0 ∼ normal(µ, σ 2 ), hunger per a subject. Hence the number of total ratings
was 2,120. Figure 1 shows the whole data set. Figure 2
∼ normal(µ, σ 2 ),
(s)
µ0
shows examples of answerers in two surveys, and examples
a(r)
u ∼ normal(0, 1/τa ), of combined corresponding data.
b(r) ∼ normal(0, 1/τb ), We divided the dataset into training data and test data.
c
First, we randomly left one real context rating out of the 10
cs(r) ∼ normal(0, 1/τc ), ratings of each subject for evaluation. The rest of the 9 rat-
a(s)
u ∼ normal(0, 1/τa ),
b(s)
c ∼ normal(0, 1/τb ), Answer in Real Contexts Answers in Supposed Contexts
c(s)
s ∼ normal(0, 1/τc ), Independent
Supposed Context Data
τa ∼ gamma(ν, θ), No Data Available (S=si, U=ui, C=ci, V=vSi)
τb ∼ gamma(ν, θ). 2,120 records
τc ∼ gamma(ν, θ), Corresponding Corresponding
Real Context Data Supposed Context Data
(S=sk, U=uk, C=ck, V=vRk) (S=sk, U=uk, C=ck, V=v Sk)
Here, rucs
(r)
denotes a rating in a real context, and rucs
(s)
de- 2,120 records 2,120 records
notes corresponding rating in a supposed context. This
model is composed of two hierarchical context-aware prefer-
ence models, the generative model of the real context data Combined Corresponding
Real and Supposed Context Data
and the supposed context data. They are connected through (S=sk, U=uk, C=ck, V=vRk, V=vSk)
common hyper-hyper parameters τ, τa , τb , τc . Through the 2,120 records
common hyper-hyper parameters, information in the ratings
in supposed contexts can aect to the posterior probability
distribution of predicted ratings in the real context model. Figure 1: Structure of the whole data set [17]
1st Survey
Table 1: Average of mean squared eerror for 10 experiments
Subject Food Real Supposed Preference
1 Noodle Full × 3
1 Noodle Full Normal 2 L Supposed Context Data + Only Real Context Data
1 Noodle Full Hungry 2
Real Context Data
0 1.95 (0.10)
2nd Survey 1 1.66 (0.17) 1.71 (0.20)
Subject Food Real Supposed Preference 2 1.50 (0.14) 1.49 (0.14)
1 Noodle Hungry × 1 3 1.51 (0.11) 1.51 (0.12)
1 Noodle Hungry Normal 1 4 1.48 (0.12) 1.48 (0.12)
1 Noodle Hungry Full 2 5 1.43 (0.13) 1.43 (0.12)
6 1.43 (0.13) 1.42 (0.11)
Subj. Food Menu Context Real Supposed 7 1.42 (0.11) 1.42 (0.11)
Preference Preference
8 1.40 (0.12) 1.39 (0.13)
1 Noodle Full 3 2
9 1.40 (0.12) 1.39 (0.13)
1 Noodle Hungry 1 2
3 Steak Full 2 3
3 Steak Normal 2 2
2
1 Fried Rice Hungry 1 2
1.9
Supposed Context Data +
1 Fried Rice Full 2 3
Real Context Data
1.8
Real Context Data Only
Figure 2: Examples of ratings in two surveys and combined 1.7
corresponding data [3]
MSE
1.6
1.5
ings per a subject in real contexts are used as training data. 1.4
In order to evaluate the eect of the number of real context
1.3
training data for constructing preference model, we change
the number of real context ratings per a subject which are 1.2
used for model construction from 0 (supposed only) to 9. 0 1 2 3 4 5 6 7 8 9
L
For the supposed context data, all 10 ratings per a subject
are used for model construction.
We repeated the experiment 10 times with dierent divi- Figure 3: Eect of Model Adaptation
sion of the real context data and evaluated the accuracy of
the predicted ratings for the left out test data in real con-
texts. We evaluated average and standard deviation of the
mean squared error (MSE) of the predictions. We also eval- • Constructing preference model with only supposed con-
uate the prediction accuracy of the model constructed with text data is dangerous,
only real context data.
Experiments were conducted with the open source statis- • Very small amount of real context data can improve
tical computing software R and software for Bayesian Monte the model.
Carlo simulation WinBUGS [14, 22]. For connecting R to
WinBUGS, we used R package R2WinBUGS. We set µ = However, the results showed that the models constructed
2.0, σ = 10, ν = 2.0, θ = 1.0, however, the results are robust with only small number of real context data perform rather
with respect to the values of these parameters. well also. Even using only 2 real context data per a subject,
the performance of the model is almost equal to the per-
3.2 Result and discussion formance of the model constructed by combining supposed
and real context data. This is because Bayesian hierarchi-
Table 1 shows the average and standard deviation of MSE
cal models is able to make robust prediction even though
for various values of the number of real context data L =
the number of training data is very small. This means that
0, ..., 9. Standard deviations are depicted in brackets. We
using the supposed context data is eective only for L = 1
also visualize the average of the MSE values in Figure 2.
(cold start) case.
This results demonstrate that as the number of real con-
text data L increases, the MSE of the predicted rating in
real context decreases monotonically. Hence, model adapta- 4. CONCLUSION AND FUTURE WORK
tion by combining a small amount of real context data with In this paper, we propose to apply Bayesian hierarchi-
a large amount of supposed context data is veried to be cal modeling to preference model adaptation by combining
eective. real and supposed context data. The results of the experi-
In particular, the performance for L = 1, that is, con- ments with food preference data demonstrate that the model
structing model with 10 supposed context data + 1 real con- adaptation is eective in particular for the cases where very
text data, is much better than the performance for L = 0, small amount of real context data is available. This means
that is, constructing model only with supposed context data. that the model adaptation provides a solution for the cold
This demonstrates the facts that start problem in context-aware recommender systems. Note
that Umyarov and Tuzhillin observed a very similar phe- Language Technologies: The 2009 Annual Conference
nomena in dierent context. They showed a small amount of the North American Chapter of the ACL,
of aggregated external rating data can signicantly improve pp.602610, 2009.
the performance of a Bayesian hierarchical preference model [9] D. Gildes and T. Hofmann. Topic-based language
[21]. models using EM. In Proceedings of 6th European
There are several future works. The rst one is more in- Conference on Speech Communication and Technology
tensive evaluation. In this paper we evaluated the method (Eurospeech 99), pp.21672170, 1999.
with small scale dataset. As the number of users, items and [10] A. Karatzoglou, X. Amatriain, N. Oliver, and L.
contexts increases, the more training data is necessary for Baltrunas. Multiverse recommendation:
constructing good preference models. Hence the importance N-dimensional tensor factorization for context-aware
of the model adaptation is expected to increase. Evaluat- collaborative ltering. In Proceedings of ACM
ing with data in domains other than food preference is also Recommender Systems 2010, 2010.
important. [11] C. Leggetter, P. Woodland. Maximum likelihood
The second one is to apply the method to dierent base linear regression for speaker adaptation of continuous
models. The model adaptation technique with Bayesian density hidden Markov models. Computer, Speech and
hierarchical modeling is independent from the generative Langage, vol. 9, no. 2, pp.171185, 1995.
model of ratings. In this work, we used the simple linear [12] M. A. McCarthy. Bayesian Methods for Ecology.
Gaussian model of rating generation. More elaborated gen- Cambridge University Press, New York, 2007.
erative models of ratings such as probabilistic tensor factor-
[13] NIPS2005 Workshop Inductive Transfer: 10 Years
ization model [10, 23] can be used instead. Using generative
Later, http://iitrl.acadiau.ca/itws05/
models for ordered ratings may be eective also.
The third one is to investigate other model adaptation [14] I. Ntzoufras. Bayesian Modeling Using WinBUGS.
techniques. The proposed model adaptation technique is Wiley, 2009.
bi-directional. This means that the combined model is sym- [15] S. J. Pan and Q. Yang. A survey of transfer learning.
metric for source and target domains. Investigating more IEEE Transactions on Knowledge and Data
directional model adaptation techniques is an interesting fu- Engineering, vol. 22, no. 10, pp. 13451359, 2010.
ture work. [16] C. Ono, M. Kurokawa, Y. Motomura and H. Asoh. A
context-aware movie preference model using a
Acknowledgments. Bayesian network for recommendation and promotion.
We thank Dr. Yasuyuki Nakajima, President and CEO of In 11th International Conference, UM 2007, Corfu,
KDDI R&D Laboratories Inc., for his continuous support Greece, July, 2007, Proceedings, LNCS vol. 4511, pp.
of this study. This work was supported in part by JSPS 247257, Springer-Verlag, 2007.
KAKANHI 20650030. [17] C. Ono, Y. Takishima, Y. Motomura and H. Asoh.
Context-aware prefence model based on a study of
dierence between real and supposed context data. In
5. REFERENCES User Modeling, Adaptation, and Personalization, 17th
[1] A. Ansari, S. Essegaier, and R. Kohli. Internet International Conference, UMAP2009, Proceedings,
recommendation systems. Journal of Marketing LNCS vol. 5535, pp. 102113, 2009.
Research, vol. 37, no. 3, 2000. [18] P. E. Rossi, G. M. Allenby and R. McCulloch.
[2] H. Asoh, C. Ono and Y. Motomura. A movie Bayesian Statistics and Marketing. Wiley, 2005.
recommendation method considering both users' [19] T. Takiguchi. Statistical Acoustic Model Adaptation
personality and situation. In Proceedings of the for Robust Speech Recognition in Noisy Reverberant
ECAI2006 Workshop on Recommender Systems, pp. Environments. Doctral Thesis, Nara Institute of
4548, 2006. Science and Technology, 1999.
[3] H. Asoh, C. Ono and Y. Motomura. An analysis of [20] S. Thrun and L. Pratt (eds.). Learning to Learn,
dierences between preferences in real and supposed Kluwer Academic Publishers, 1998.
contexts. In Proceedings of 2nd Workshop on [21] A. Umyarov and A. Tuzhilin. Using External
Context-Aware Recommender Systems (CARS-2010), Aggregate Ratings for Improving Individual
2010. Recommendations, ACM Transactions on the Web,
[4] H. Asoh, C. Ono, and Y. Motomura. A Bayesian vol. 5, no. 1, article 3, 2011.
hierarchical preference model for context-aware [22] http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/
recommendations, Adjunct Proceeding of UMAP contents.shtml.
2010, 2010. [23] L. Xiong, X. Chen, T.-K. Huang, J. Schneider, and J.
[5] P. Congdon. Bayesian Statistical Modelling, Second Carbonell. Temporal collaborative ltering with
Edition. Wiley, 2006. Bayesian probabilistic tensor factorization. In
[6] H. Daume III, D. Marcu. Domain adaptation for Proceedings of SIAM Data Mining 2010 (SDM 10),
sitasitical classiers. Journal of Articial Intelligence 2010.
Research, vol. 26, pp. 101126, 2006. [24] K. Yu and V. Tresp. Learning to learn and
[7] H. Daume III. Bayesian multitask learning with latent collaborative ltering. In NIPS 2005 Workshop
hierarchy. In Proceedings of the 25th Conference on "Indctive Transfer: 10 Years Later", 2005.
Uncertainty in Articial Intelligence (UAI), 2009.
[8] J. R. Finkel, C. D. Manning. Hierarchical Bayesian
domain adaptation. In Proceedings of Human