From Recordings to Recommendations: Suggesting Live Events in the DVR Context Alessandro Basso, Marco Milanesio, André Panisson, Giancarlo Ruffo Dipartimento di Informatica Università degli Studi di Torino Torino, Italy {basso,milane,panisson,ruffo}@di.unito.it ABSTRACT known recommendation algorithms, that exploit user pro- Providing valuable recommendations in the DVR domain is files for increasing accuracy and take into account privacy quite straightforward when enough information about users issues as well. and/or contents is known. In this work, we discuss the pos- Second, differently from the Video on Demand domain, the sibility of recommending future live events without knowing usage of an Electronic Program Guide (EPG) is not always anything else but past user programmed recording sched- assured. This fact brings two consequences: (a) there is ules. no knowledge on the content the user is recording and/or watching, and (b) there is no well defined one-to-one corre- Categories and Subject Descriptors spondence between a recording and a broadcast event. This H.3.3 [Information Storage and Retrieval]: Information leads to the impossibility of directly recommending record- Search and Retrieval—Information filtering; J.4 [Computer ings to users. Applications]: Social And Behavioral Sciences Taking into account these considerations brings us our re- search question: in such a domain, is it possible to give General Terms valuable live event recommendations to users, only consid- Algorithms, Experimentation, Human Factors. ering their recording activity on the DVR? Users have to be brought to contents of interest, but, differently from other Keywords approaches, we are not using anything but collaborative fil- Digital Video Recorders, TV Broadcasts, Recommendation tering technique on users’ activity. Thus, the main contri- Systems, Collaborative Algorithms, Implicit Data bution of our approach is the demonstration that this can be achieved without any knowledge on what is being broadcast, 1. INTRODUCTION neither EPGs nor content classifications. A Digital Video Recorder (DVR) is a device aimed at record- ing digital streams to a storage. DVRs can be either hard- ware devices, such as set-top-boxes and portable media play- 2. RELATED WORK The task of recommending live events, such as TV shows, ers, or software devices, such as web/PC-based Personal has been already investigated in the past years. Proposed Video Recorders (PVRs), managing all user interactions and methods can exploit different ways to collect the required personalizations. By using DVRs, users are no longer bound information for user profiling, as well as can make use of to the broadcaster’s schedule, but are free to define their various recommendation algorithms. In particular, some ap- personal lists of programs at any time. proaches, such as [6], explicitly ask the users about their interests and build suggestions on top of the resulting user In order to provide a better user experience by means of fo- profiles. A different idea, which is adopted in several works cused advices (e.g., recommendation of new contents), the [2, 9, 10], makes use of implicit feedbacks, i.e., information arisen issues can be summarized in two main categories. derived from the analysis of the user behavior while using First, some logging activity must be done to find common the DVR. Other solutions, as [4, 12, 15], propose recom- usage patterns on which identify potential users’ interests. mender systems which make use of user’s view history as Users are not willing to offer an explicit profile when using well as both explicit and implicit feedbacks. According to a DVR, thus we do have, possibly, only a set of observations authors, such a mixed technique allows to obtain the best on their activity. This is an important challenge for many performance. Another feature to tell apart existing methods for live events recommendation is the recommender algorithm used. A common approach relies on the content of the programs broadcast and it is therefore called content-based. Exam- ples in this category can be found in [10, 12]. Some authors devised recommenders that make use of multiple content- Copyright is held by the author/owner(s). Workshop on the Practical Use of Recommender Systems, Algorithms and Technologies (PRSAT 2010), held based techniques, as in [3, 4]. in conjunction with RecSys 2010. September 30, 2010, Barcelona, Spain. A solution able to increase novelty of recommendations is collaborative filtering, like the works in [2, 5]. Another in- similar on timings. Specific values are used to define the teresting method is proposed in [9] and exploits the latent maximum clustering distance for the start and the end times. factor model. The output of this activity is a set of clusters, each identi- fying a single event. The centroid of the cluster, i.e., the In this work, we focus on implicit feedbacks only and we use recording that minimizes the intra-cluster timing distances, a collaborative filtering approach to compute recommenda- is considered the representative of the event. tions. Our aim is to minimize the information required as input of the recommender system, without sacrificing the Aggregation. As the clustering occurs periodically, this novelty. The real challenge is to be able to recommend pro- operation aims to identify newly created events character- grams to users without actually knowing anything about ized by the same channel and periodicity of the formerly what is broadcast on TV, since no EPG is used (differently created ones, but comparable timings. Such elements refer from existing methods). to the same programs and are therefore merged into unique events, whose properties are updated by taking into account 3. DATASET the values of all the similar ones. Faucet is a PVR integrated in a podcasting service1 , which allows the recording and further downloading of Italian TV Collapsing. A further refinement phase is required to grant and Radio broadcasts [1]. The activity of the users is in- the consistency of the generated events. In fact, due to the crementally collected (hourly) into a log file containing the high variability of timings, especially when a new transmis- scheduled recordings set in the past hour as well as the oc- sion appears, events which are initially considered as non curred downloads. The resulting dataset is populated by real referring to the same transmission tend to slowly and inde- users expressing their preferences through the recorded pro- pendently converge to more stable timeframes. This implies grams. The dataset is publicly available at http://secnet. the need of merging them into single events. di.unito.it/vcast. Each registered user can fix the desired settings for the As a result of the processing phase, given a set of recording recording of interest. At the end of the process, her record- clustered together, each one with the same channel ci and ing is scheduled for the given time and will be further avail- the same periodicity pi , we compute a discrete event ei in able for downloading purposes. Each recording ri , thus, is the form of: < {ui }, t, ci , b, e, pi >, where {ui } is the set of defined as a tuple < ui , ci , ti , bi , ei , pi > with the following users whose recordings were clustered together; t is the user notation: user ui sets up a recording on channel ci , starting generated title most frequent among users in {ui }; b and e from time bi and ending at time ei , with a title ti and a pe- are, respectively, the starting and ending time computed as riodicity pi (e.g., once, every Tuesday, mon-fri). In Faucet, the median value of all the clustered recordings. channels and periodicity values are fixed (users can choose their ci and pi from a combobox), while all other fields are completely up to the user. 4.2 From Events to Recommendations After the end time expires, the recording is made available When future events are computed from scheduled record- to the user for downloading. In case of periodic events, the ings, we are thus able to propose them to users by means of recording step can occur an undefined number of times. Af- two different charts: (1) a global chart returning those events ter each recording step, the respective download is made computed starting from the largest groups of recordings, i.e., available. those chosen by the largest sets of users; and (2) a user-based recommendation list, returning a set of new events of pos- 4. METHODOLOGY sible interest to each user requesting it, computed through a similarity function over the whole population. We call In this section, we want to outline what our approach is. them Most Popular and Rec2 (Recordings times Recommen- Given no knowledge on the broadcasts, we collect the users dations), respectively. Both charts are computed by means activity to compute what we call discrete events, to be used of the memory based collaborative filtering approach named for recommendation purposes and top chart list building. k -Nearest Neighbors (kNN) [14]. We apply both variants of the kNN algorithm: the user-based one [8], by identify- 4.1 From Recordings to Events ing users interested in similar contents; and the item-based The extraction of meaningful information from the unstruc- approach [7], by focusing on items shared by two or more tured amount of data contained in the dataset is essential users. to define a set of events which map the broadcast programs. Through the event discovery phase, we can discretize the In kNN, the weight (i.e., a measure of interest) of an element continuous domain of timings defined by the recordings, cre- ei for an user uk can be defined as: ating the basis for the application of a recommender algo- X rithm. The basic procedure used in the discretization was w(uk , ei ) = r(ua , ei ) · c(uk , ua ), (1) first introduced in [1] and covers a number of subsequent ua ∈N (uk ) steps: where N (uk ) are the neighbors of user uk and r(ua , ei ) is Clustering. Recordings are clustered together by consider- equal to 1 if user ua is associated to the event ei , and 0 oth- ing the channel, the periodicity and the difference between erwise. The coefficient c(uk , ua ) represents the neighbor’s starting and ending times. All recordings belonging to the information weight for user uk . In most of the kNN-based same cluster are thus equal as channel and periodicity, whilst algorithms [8], the coefficient used is the similarity between 1 uk and ua . http://www.vcast.it/ Most Popular. The MostPopular algorithm can be defined by means of eq. (1), assuming the number of neighbors un- 100 bounded, which implies N (uk ) = U, ∀uk ∈ U ; and c(ua , ub ) = Probability that an element have x recordings 1, ∀ua , ub ∈ U , with U as the setP of all users. Thus, the 10-1 weight is modified as w(uk , ei ) = ua r(ua , ei ). 10-2 After calculating the weight of all elements, they are sorted in descendant order. In the MostPopular algorithm, as the 10-3 set of neighbors is independent of the user, all users re- ceive the same recommended elements, i.e., the most popular ones. 10-4 Rec2 . In order to provide personal suggestions, we have to 10-5 define a similarity function for grouping similar users (items) from which choosing the appropriate elements to recom- 10-6 0 10 101 102 103 104 mend. Our definition of similarity is based only on implicit number of recordings feedbacks, resulting from observing the behavior of users: if she records something, then we assume that she is interested Figure 1: Number of recordings per event in it; otherwise, we can not infer anything about the interest (Probability density function) of the user for that element. We are therefore considering binary feedbacks. We are giving particular emphasis on the recall measure; in Given two users u and v and the associated discrete events fact, since we do not have explicit feedbacks regarding the Eu and Ev , we can choose the similarity metric, S(u, v), user’s interest in those items which have not been consid- considering several well known measures (e.g., Dice, Cosine ered (i.e., not programmed, nor downloaded), precision is and Matching) [11]. After choosing a metric, ∀u we can not very meaningful [9]. compute the subset Nu ⊆ U of neighbors of user u. A user v such that Ev ∩ Eu 6= ∅ is thus defined as a neighbor of u. First, we choose different similarity functions to understand Starting from the neighborhood of u, the similarity with u is whether similarity influences the results of the user-based computed for each pair < u, v > such that v ∈ Nu . Finally, kNN algorithm. From Figure 2(a) it is clear that, in this if S(u, v) > 0, we consider u similar to v. The value S(u, v) case, all chosen similarity metrics show nearly the same per- is used to weight such a relation, therefore determining a formance. similarity order among the neighborhood of u, from which choosing new events to recommend to u. The second step is to find the optimal value for k. Figure 2(b) shows the results with k ∈ {100, 300, 500, 700, 2000} Similarly, this approach can be adopted for the item-based in user-based kNN (Dice similarity), and the MostPopu- similarity: two events are considered similar if the share at lar recommender. We omit the values of k = {500, 700} least a single user that is associated to both of them. since the results are almost equal to k = 300. Compared to the MostPopular algorithm (i.e., unbounded neighbors), 5. EVALUATION a value k = 100 is not enough to outperform it, whilst for In the following, we evaluate the obtained results in the k = 2000, kNN starts to converge to it. Considering the top event extraction process and in the recommendation of new 10 recommended elements, we can achieve the best results events to users, both in Most Popular and in Rec2 . for k = 300, whilst k = 500 is more suitable when taking further elements. As in most cases 10 elements are sufficient 5.1 Event Extraction for a recommendation, k = 300 offers a good trade-off be- As a remainder, we are dealing with several independent, tween valuable recommendations and resource consumption user generated recording schedules, that we cluster together for building the neighborhood. and from which we compute the discrete events. In Figure 1, a view of the distribution of the recordings is given: for each A comparison among user/item-based kNN and MostPop- detected event, the number of recordings clustered together ular is depicted in Figure 2(c). We can observe that the changes according to users’ activity. As it turns out, most latter is clearly outperformed by the other two algorithms, recordings (and, thus, most users) tend to be clustered and especially when more than 7 recommended items are con- aggregated on very few events, while there are lots of events sidered. The user-based algorithm performs slightly better with very few recordings. The Most Popular algorithm ex- than the item-based one (more noticeable with more than ploit these inner features of the resulting discrete events to 15 recommended items). In general, item-based algorithms compute the top chart. tend to perform better because usually the number of items is considerably lower than the users [14], but this property does not hold in our domain. 5.2 Computing Recommendation We measure how accurate is the recommendation in predict- ing the elements that users would program in terms of re- 6. CONCLUSION call. These values are computed as the average of all users’ In this paper we show how to recommend live events to recall values using the top n recommended elements [13]. users without any knowledge about the broadcast content 35% 40% 40% 30% 35% 35% 30% 30% 25% 25% 25% 20% Recall Recall Recall 20% 20% 15% 15% 15% 10% 10% 10% Most Popular 5% Dice similarity 5% k=2000 5% User-based KNN Cosine similarity k=300 Item-based KNN Matching similarity k=100 Most Popular 0% 0% 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0% 0 5 10 15 20 25 30 top selection top selection top selection (a) Comparison between similarity func- (b) Recall for user-based kNN (c) Recall for kNN (k = 300) wrt Most- tions in user-based kNN Popular Figure 2: Comparison between recommenders and recall for kNN and Most Popular. and user’s likes. Recommendations can be given both glob- [7] M. Deshpande and G. Karypis. Item-based top-n ally and personally. It is important to underline that the recommendation algorithms. ACM Trans. Inf. Syst., most popular events are easier to predict since users tend to 22(1):143–177, 2004. naturally focus on them, even without any specific sugges- [8] J. L. Herlocker, J. A. Konstan, A. Borchers, and tion. On the contrary, granting a high novelty in personal J. Riedl. An algorithmic framework for performing recommendations is a more challenging goal due to the re- collaborative filtering. In SIGIR ’99: Proceedings of duced amount of explicit information. Nevertheless, we can the 22nd annual international ACM SIGIR conference obtain interesting results even exploiting a simple approach on Research and development in information retrieval, as the kNN. We are currently attempting other approaches pages 230–237, New York, NY, USA, 1999. ACM. to recommendation (e.g., latent factor model) with implicit [9] Y. Hu, Y. Koren, and C. Volinsky. Collaborative feedbacks, with the aim of improving the prediction accu- filtering for implicit feedback datasets. In ICDM ’08: racy. Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 263–272, Washington, DC, USA, 2008. IEEE Computer Society. 7. REFERENCES [10] S. G. Kaushal, S. Gutta, K. Kurapati, K. Lee, [1] A. Basso, M. Milanesio, and G. Ruffo. Events J. Martino, J. Milanski, J. D. Schaffer, and discovery for personal video recorders. In EuroITV J. Zimmerman. Tv content recommender system. In ’09: Proceedings of the seventh european conference on In Proceedings of the 17th National Conference on European interactive television conference, pages Artificial Intelligence, pages 1121–1122. AAAI Press / 171–174, New York, NY, USA, 2009. ACM. The MIT Press, 2000. [2] P. Baudisch and L. Brueckner. Tv scout: Guiding [11] B. Markines, C. Cattuto, F. Menczer, D. Benz, users from printed guides to personalized tv program. A. Hotho, and G. Stumme. Evaluating similarity In In Proceedings of the 2nd Workshop on measures for emergent semantics of social tagging. In Personalization in Future TV (May 28, Malaga, WWW ’09: Proceedings of the 18th international Spain), Universidad de Malaga, pages 151–160, 2002. conference on World wide web, pages 641–650, New [3] Y. Blanco-Fernández, J. J. Pazos-Arias, A. Gil-Solla, York, NY, USA, 2009. ACM. M. Ramos-Cabrer, B. Barragáns-Martı́nez, [12] M. Rovira, J. Gonzàlez, A. López, J. Mas, A. Puig, M. López-Nores, J. Garcı́a-Duque, A. Fernández-Vilas, J. Fabregat, and G. Fernandez. Indextv: a mpeg-7 and R. P. Dı́az-Redondo. A multi-agent open based personalized recommendation system for digital architecture for a tv recommender system: A case tv. In ICME, pages 823–826. IEEE, 2004. study using a bayesian strategy. Multimedia Software [13] B. M. Sarwar, G. Karypis, J. A. Konstan, and J. T. Engineering, International Symposium on, pages Riedl. Application of dimensionality reduction in 178–185, 2004. recommender system - a case study. In In ACM [4] A. L. Buczak, J. Zimmerman, and K. Kurapati. WebKDD Workshop, 2000. Personalization: Improving ease-of-use, trust and [14] B. M. Sarwar, G. Karypis, J. A. Konstan, and J. T. accuracy of a tv show recommender. In in Proceedings Riedl. Item-based collaborative filtering of the TV’02 workshop on Personalization in TV, recommendation algorithms. In WWW ’01: Malaga, pages 3–12, 2002. Proceedings of the 10th international conference on [5] P. Cremonesi and R. Turrin. Analysis of cold-start World Wide Web, pages 285–295, New York, NY, recommendations in iptv systems. In RecSys ’09: USA, 2001. ACM. Proceedings of the third ACM conference on [15] K. K. Srinivas, S. Gutta, D. Schaffer, J. Martino, and Recommender systems, pages 233–236, New York, NY, J. Zimmerman. A multi-agent tv recommender. In In USA, 2009. ACM. Proceedings of the UM 2001 workshop [6] D. Das and H. ter Horst. Recommender systems for ”Personalization in Future TV”, 2001. tv. In In Proceedings of AAAI, 1998.