From Recordings to Recommendations: Suggesting Live
               Events in the DVR Context

                    Alessandro Basso, Marco Milanesio, André Panisson, Giancarlo Ruffo
                                                         Dipartimento di Informatica
                                                        Università degli Studi di Torino
                                                                  Torino, Italy
                                            {basso,milane,panisson,ruffo}@di.unito.it

ABSTRACT                                                                     known recommendation algorithms, that exploit user pro-
Providing valuable recommendations in the DVR domain is                      files for increasing accuracy and take into account privacy
quite straightforward when enough information about users                    issues as well.
and/or contents is known. In this work, we discuss the pos-                  Second, differently from the Video on Demand domain, the
sibility of recommending future live events without knowing                  usage of an Electronic Program Guide (EPG) is not always
anything else but past user programmed recording sched-                      assured. This fact brings two consequences: (a) there is
ules.                                                                        no knowledge on the content the user is recording and/or
                                                                             watching, and (b) there is no well defined one-to-one corre-
Categories and Subject Descriptors                                           spondence between a recording and a broadcast event. This
H.3.3 [Information Storage and Retrieval]: Information                       leads to the impossibility of directly recommending record-
Search and Retrieval—Information filtering; J.4 [Computer                    ings to users.
Applications]: Social And Behavioral Sciences
                                                                             Taking into account these considerations brings us our re-
                                                                             search question: in such a domain, is it possible to give
General Terms                                                                valuable live event recommendations to users, only consid-
Algorithms, Experimentation, Human Factors.                                  ering their recording activity on the DVR? Users have to be
                                                                             brought to contents of interest, but, differently from other
Keywords                                                                     approaches, we are not using anything but collaborative fil-
Digital Video Recorders, TV Broadcasts, Recommendation                       tering technique on users’ activity. Thus, the main contri-
Systems, Collaborative Algorithms, Implicit Data                             bution of our approach is the demonstration that this can be
                                                                             achieved without any knowledge on what is being broadcast,
1. INTRODUCTION                                                              neither EPGs nor content classifications.
A Digital Video Recorder (DVR) is a device aimed at record-
ing digital streams to a storage. DVRs can be either hard-
ware devices, such as set-top-boxes and portable media play-
                                                                             2.   RELATED WORK
                                                                             The task of recommending live events, such as TV shows,
ers, or software devices, such as web/PC-based Personal
                                                                             has been already investigated in the past years. Proposed
Video Recorders (PVRs), managing all user interactions and
                                                                             methods can exploit different ways to collect the required
personalizations. By using DVRs, users are no longer bound
                                                                             information for user profiling, as well as can make use of
to the broadcaster’s schedule, but are free to define their
                                                                             various recommendation algorithms. In particular, some ap-
personal lists of programs at any time.
                                                                             proaches, such as [6], explicitly ask the users about their
                                                                             interests and build suggestions on top of the resulting user
In order to provide a better user experience by means of fo-
                                                                             profiles. A different idea, which is adopted in several works
cused advices (e.g., recommendation of new contents), the
                                                                             [2, 9, 10], makes use of implicit feedbacks, i.e., information
arisen issues can be summarized in two main categories.
                                                                             derived from the analysis of the user behavior while using
First, some logging activity must be done to find common
                                                                             the DVR. Other solutions, as [4, 12, 15], propose recom-
usage patterns on which identify potential users’ interests.
                                                                             mender systems which make use of user’s view history as
Users are not willing to offer an explicit profile when using
                                                                             well as both explicit and implicit feedbacks. According to
a DVR, thus we do have, possibly, only a set of observations
                                                                             authors, such a mixed technique allows to obtain the best
on their activity. This is an important challenge for many
                                                                             performance.

                                                                             Another feature to tell apart existing methods for live events
                                                                             recommendation is the recommender algorithm used. A
                                                                             common approach relies on the content of the programs
                                                                             broadcast and it is therefore called content-based. Exam-
                                                                             ples in this category can be found in [10, 12]. Some authors
                                                                             devised recommenders that make use of multiple content-
Copyright is held by the author/owner(s). Workshop on the Practical Use of
Recommender Systems, Algorithms and Technologies (PRSAT 2010), held          based techniques, as in [3, 4].
in conjunction with RecSys 2010. September 30, 2010, Barcelona, Spain.       A solution able to increase novelty of recommendations is
collaborative filtering, like the works in [2, 5]. Another in-          similar on timings. Specific values are used to define the
teresting method is proposed in [9] and exploits the latent             maximum clustering distance for the start and the end times.
factor model.                                                           The output of this activity is a set of clusters, each identi-
                                                                        fying a single event. The centroid of the cluster, i.e., the
In this work, we focus on implicit feedbacks only and we use            recording that minimizes the intra-cluster timing distances,
a collaborative filtering approach to compute recommenda-               is considered the representative of the event.
tions. Our aim is to minimize the information required as
input of the recommender system, without sacrificing the                Aggregation. As the clustering occurs periodically, this
novelty. The real challenge is to be able to recommend pro-             operation aims to identify newly created events character-
grams to users without actually knowing anything about                  ized by the same channel and periodicity of the formerly
what is broadcast on TV, since no EPG is used (differently              created ones, but comparable timings. Such elements refer
from existing methods).                                                 to the same programs and are therefore merged into unique
                                                                        events, whose properties are updated by taking into account
3. DATASET                                                              the values of all the similar ones.
Faucet is a PVR integrated in a podcasting service1 , which
allows the recording and further downloading of Italian TV              Collapsing. A further refinement phase is required to grant
and Radio broadcasts [1]. The activity of the users is in-              the consistency of the generated events. In fact, due to the
crementally collected (hourly) into a log file containing the           high variability of timings, especially when a new transmis-
scheduled recordings set in the past hour as well as the oc-            sion appears, events which are initially considered as non
curred downloads. The resulting dataset is populated by real            referring to the same transmission tend to slowly and inde-
users expressing their preferences through the recorded pro-            pendently converge to more stable timeframes. This implies
grams. The dataset is publicly available at http://secnet.              the need of merging them into single events.
di.unito.it/vcast.
Each registered user can fix the desired settings for the               As a result of the processing phase, given a set of recording
recording of interest. At the end of the process, her record-           clustered together, each one with the same channel ci and
ing is scheduled for the given time and will be further avail-          the same periodicity pi , we compute a discrete event ei in
able for downloading purposes. Each recording ri , thus, is             the form of: < {ui }, t, ci , b, e, pi >, where {ui } is the set of
defined as a tuple < ui , ci , ti , bi , ei , pi > with the following   users whose recordings were clustered together; t is the user
notation: user ui sets up a recording on channel ci , starting          generated title most frequent among users in {ui }; b and e
from time bi and ending at time ei , with a title ti and a pe-          are, respectively, the starting and ending time computed as
riodicity pi (e.g., once, every Tuesday, mon-fri). In Faucet,           the median value of all the clustered recordings.
channels and periodicity values are fixed (users can choose
their ci and pi from a combobox), while all other fields are
completely up to the user.                                              4.2 From Events to Recommendations
After the end time expires, the recording is made available             When future events are computed from scheduled record-
to the user for downloading. In case of periodic events, the            ings, we are thus able to propose them to users by means of
recording step can occur an undefined number of times. Af-              two different charts: (1) a global chart returning those events
ter each recording step, the respective download is made                computed starting from the largest groups of recordings, i.e.,
available.                                                              those chosen by the largest sets of users; and (2) a user-based
                                                                        recommendation list, returning a set of new events of pos-
4. METHODOLOGY                                                          sible interest to each user requesting it, computed through
                                                                        a similarity function over the whole population. We call
In this section, we want to outline what our approach is.
                                                                        them Most Popular and Rec2 (Recordings times Recommen-
Given no knowledge on the broadcasts, we collect the users
                                                                        dations), respectively. Both charts are computed by means
activity to compute what we call discrete events, to be used
                                                                        of the memory based collaborative filtering approach named
for recommendation purposes and top chart list building.
                                                                        k -Nearest Neighbors (kNN) [14]. We apply both variants
                                                                        of the kNN algorithm: the user-based one [8], by identify-
4.1 From Recordings to Events                                           ing users interested in similar contents; and the item-based
The extraction of meaningful information from the unstruc-              approach [7], by focusing on items shared by two or more
tured amount of data contained in the dataset is essential              users.
to define a set of events which map the broadcast programs.
Through the event discovery phase, we can discretize the                In kNN, the weight (i.e., a measure of interest) of an element
continuous domain of timings defined by the recordings, cre-            ei for an user uk can be defined as:
ating the basis for the application of a recommender algo-
                                                                                                      X
rithm. The basic procedure used in the discretization was                          w(uk , ei ) =                 r(ua , ei ) · c(uk , ua ),   (1)
first introduced in [1] and covers a number of subsequent                                          ua ∈N (uk )
steps:
                                                                        where N (uk ) are the neighbors of user uk and r(ua , ei ) is
Clustering. Recordings are clustered together by consider-              equal to 1 if user ua is associated to the event ei , and 0 oth-
ing the channel, the periodicity and the difference between             erwise. The coefficient c(uk , ua ) represents the neighbor’s
starting and ending times. All recordings belonging to the              information weight for user uk . In most of the kNN-based
same cluster are thus equal as channel and periodicity, whilst          algorithms [8], the coefficient used is the similarity between
1                                                                       uk and ua .
    http://www.vcast.it/
Most Popular. The MostPopular algorithm can be defined
by means of eq. (1), assuming the number of neighbors un-                                                              100
bounded, which implies N (uk ) = U, ∀uk ∈ U ; and c(ua , ub ) =


                                                                       Probability that an element have x recordings
1, ∀ua , ub ∈ U , with U as the setP of all users. Thus, the                                                           10-1
weight is modified as w(uk , ei ) = ua r(ua , ei ).
                                                                                                                       10-2
After calculating the weight of all elements, they are sorted
in descendant order. In the MostPopular algorithm, as the
                                                                                                                       10-3
set of neighbors is independent of the user, all users re-
ceive the same recommended elements, i.e., the most popular
ones.                                                                                                                  10-4


Rec2 . In order to provide personal suggestions, we have to                                                            10-5
define a similarity function for grouping similar users (items)
from which choosing the appropriate elements to recom-                                                                 10-6 0
                                                                                                                           10   101           102            103   104
mend. Our definition of similarity is based only on implicit                                                                          number of recordings
feedbacks, resulting from observing the behavior of users: if
she records something, then we assume that she is interested      Figure 1: Number of recordings per event
in it; otherwise, we can not infer anything about the interest    (Probability density function)
of the user for that element. We are therefore considering
binary feedbacks.
                                                                  We are giving particular emphasis on the recall measure; in
Given two users u and v and the associated discrete events
                                                                  fact, since we do not have explicit feedbacks regarding the
Eu and Ev , we can choose the similarity metric, S(u, v),
                                                                  user’s interest in those items which have not been consid-
considering several well known measures (e.g., Dice, Cosine
                                                                  ered (i.e., not programmed, nor downloaded), precision is
and Matching) [11]. After choosing a metric, ∀u we can
                                                                  not very meaningful [9].
compute the subset Nu ⊆ U of neighbors of user u. A user
v such that Ev ∩ Eu 6= ∅ is thus defined as a neighbor of u.
                                                                  First, we choose different similarity functions to understand
Starting from the neighborhood of u, the similarity with u is
                                                                  whether similarity influences the results of the user-based
computed for each pair < u, v > such that v ∈ Nu . Finally,
                                                                  kNN algorithm. From Figure 2(a) it is clear that, in this
if S(u, v) > 0, we consider u similar to v. The value S(u, v)
                                                                  case, all chosen similarity metrics show nearly the same per-
is used to weight such a relation, therefore determining a
                                                                  formance.
similarity order among the neighborhood of u, from which
choosing new events to recommend to u.
                                                                  The second step is to find the optimal value for k. Figure
                                                                  2(b) shows the results with k ∈ {100, 300, 500, 700, 2000}
Similarly, this approach can be adopted for the item-based
                                                                  in user-based kNN (Dice similarity), and the MostPopu-
similarity: two events are considered similar if the share at
                                                                  lar recommender. We omit the values of k = {500, 700}
least a single user that is associated to both of them.
                                                                  since the results are almost equal to k = 300. Compared
                                                                  to the MostPopular algorithm (i.e., unbounded neighbors),
5. EVALUATION                                                     a value k = 100 is not enough to outperform it, whilst for
In the following, we evaluate the obtained results in the         k = 2000, kNN starts to converge to it. Considering the top
event extraction process and in the recommendation of new         10 recommended elements, we can achieve the best results
events to users, both in Most Popular and in Rec2 .               for k = 300, whilst k = 500 is more suitable when taking
                                                                  further elements. As in most cases 10 elements are sufficient
5.1 Event Extraction                                              for a recommendation, k = 300 offers a good trade-off be-
As a remainder, we are dealing with several independent,          tween valuable recommendations and resource consumption
user generated recording schedules, that we cluster together      for building the neighborhood.
and from which we compute the discrete events. In Figure 1,
a view of the distribution of the recordings is given: for each   A comparison among user/item-based kNN and MostPop-
detected event, the number of recordings clustered together       ular is depicted in Figure 2(c). We can observe that the
changes according to users’ activity. As it turns out, most       latter is clearly outperformed by the other two algorithms,
recordings (and, thus, most users) tend to be clustered and       especially when more than 7 recommended items are con-
aggregated on very few events, while there are lots of events     sidered. The user-based algorithm performs slightly better
with very few recordings. The Most Popular algorithm ex-          than the item-based one (more noticeable with more than
ploit these inner features of the resulting discrete events to    15 recommended items). In general, item-based algorithms
compute the top chart.                                            tend to perform better because usually the number of items
                                                                  is considerably lower than the users [14], but this property
                                                                  does not hold in our domain.
5.2 Computing Recommendation
We measure how accurate is the recommendation in predict-
ing the elements that users would program in terms of re-         6.                                              CONCLUSION
call. These values are computed as the average of all users’      In this paper we show how to recommend live events to
recall values using the top n recommended elements [13].          users without any knowledge about the broadcast content
      35%                                                                              40%                                                                   40%


      30%                                                                              35%                                                                   35%


                                                                                       30%                                                                   30%
      25%

                                                                                       25%                                                                   25%
      20%
 Recall


                                                                              Recall


                                                                                                                                                        Recall
                                                                                       20%                                                                   20%
      15%
                                                                                       15%                                                                   15%

      10%
                                                                                       10%                                                                   10%
                                                                                                                                    Most Popular
          5%                                       Dice similarity                     5%                                           k=2000                       5%                                     User-based KNN
                                                   Cosine similarity                                                                k=300                                                               Item-based KNN
                                                   Matching similarity                                                              k=100                                                               Most Popular
          0%                                                                                                                                                     0%
               0   5     10        15         20         25              30                                                                                           0   5   10        15         20      25            30
                                                                                       0%
                                                                                             0       5    10        15         20   25             30

                              top selection                                                                    top selection                                                       top selection

(a) Comparison between similarity func-                                                          (b) Recall for user-based kNN                          (c) Recall for kNN (k = 300) wrt Most-
tions in user-based kNN                                                                                                                                 Popular

                       Figure 2: Comparison between recommenders and recall for kNN and Most Popular.


and user’s likes. Recommendations can be given both glob-                                                                 [7] M. Deshpande and G. Karypis. Item-based top-n
ally and personally. It is important to underline that the                                                                    recommendation algorithms. ACM Trans. Inf. Syst.,
most popular events are easier to predict since users tend to                                                                 22(1):143–177, 2004.
naturally focus on them, even without any specific sugges-                                                                [8] J. L. Herlocker, J. A. Konstan, A. Borchers, and
tion. On the contrary, granting a high novelty in personal                                                                    J. Riedl. An algorithmic framework for performing
recommendations is a more challenging goal due to the re-                                                                     collaborative filtering. In SIGIR ’99: Proceedings of
duced amount of explicit information. Nevertheless, we can                                                                    the 22nd annual international ACM SIGIR conference
obtain interesting results even exploiting a simple approach                                                                  on Research and development in information retrieval,
as the kNN. We are currently attempting other approaches                                                                      pages 230–237, New York, NY, USA, 1999. ACM.
to recommendation (e.g., latent factor model) with implicit                                                               [9] Y. Hu, Y. Koren, and C. Volinsky. Collaborative
feedbacks, with the aim of improving the prediction accu-                                                                     filtering for implicit feedback datasets. In ICDM ’08:
racy.                                                                                                                         Proceedings of the 2008 Eighth IEEE International
                                                                                                                              Conference on Data Mining, pages 263–272,
                                                                                                                              Washington, DC, USA, 2008. IEEE Computer Society.
7. REFERENCES                                                                                                            [10] S. G. Kaushal, S. Gutta, K. Kurapati, K. Lee,
 [1] A. Basso, M. Milanesio, and G. Ruffo. Events                                                                             J. Martino, J. Milanski, J. D. Schaffer, and
     discovery for personal video recorders. In EuroITV                                                                       J. Zimmerman. Tv content recommender system. In
     ’09: Proceedings of the seventh european conference on                                                                   In Proceedings of the 17th National Conference on
     European interactive television conference, pages                                                                        Artificial Intelligence, pages 1121–1122. AAAI Press /
     171–174, New York, NY, USA, 2009. ACM.                                                                                   The MIT Press, 2000.
 [2] P. Baudisch and L. Brueckner. Tv scout: Guiding                                                                     [11] B. Markines, C. Cattuto, F. Menczer, D. Benz,
     users from printed guides to personalized tv program.                                                                    A. Hotho, and G. Stumme. Evaluating similarity
     In In Proceedings of the 2nd Workshop on                                                                                 measures for emergent semantics of social tagging. In
     Personalization in Future TV (May 28, Malaga,                                                                            WWW ’09: Proceedings of the 18th international
     Spain), Universidad de Malaga, pages 151–160, 2002.                                                                      conference on World wide web, pages 641–650, New
 [3] Y. Blanco-Fernández, J. J. Pazos-Arias, A. Gil-Solla,                                                                   York, NY, USA, 2009. ACM.
     M. Ramos-Cabrer, B. Barragáns-Martı́nez,                                                                           [12] M. Rovira, J. Gonzàlez, A. López, J. Mas, A. Puig,
     M. López-Nores, J. Garcı́a-Duque, A. Fernández-Vilas,                                                                  J. Fabregat, and G. Fernandez. Indextv: a mpeg-7
     and R. P. Dı́az-Redondo. A multi-agent open                                                                              based personalized recommendation system for digital
     architecture for a tv recommender system: A case                                                                         tv. In ICME, pages 823–826. IEEE, 2004.
     study using a bayesian strategy. Multimedia Software
                                                                                                                         [13] B. M. Sarwar, G. Karypis, J. A. Konstan, and J. T.
     Engineering, International Symposium on, pages
                                                                                                                              Riedl. Application of dimensionality reduction in
     178–185, 2004.
                                                                                                                              recommender system - a case study. In In ACM
 [4] A. L. Buczak, J. Zimmerman, and K. Kurapati.                                                                             WebKDD Workshop, 2000.
     Personalization: Improving ease-of-use, trust and
                                                                                                                         [14] B. M. Sarwar, G. Karypis, J. A. Konstan, and J. T.
     accuracy of a tv show recommender. In in Proceedings
                                                                                                                              Riedl. Item-based collaborative filtering
     of the TV’02 workshop on Personalization in TV,
                                                                                                                              recommendation algorithms. In WWW ’01:
     Malaga, pages 3–12, 2002.
                                                                                                                              Proceedings of the 10th international conference on
 [5] P. Cremonesi and R. Turrin. Analysis of cold-start                                                                       World Wide Web, pages 285–295, New York, NY,
     recommendations in iptv systems. In RecSys ’09:                                                                          USA, 2001. ACM.
     Proceedings of the third ACM conference on
                                                                                                                         [15] K. K. Srinivas, S. Gutta, D. Schaffer, J. Martino, and
     Recommender systems, pages 233–236, New York, NY,
                                                                                                                              J. Zimmerman. A multi-agent tv recommender. In In
     USA, 2009. ACM.
                                                                                                                              Proceedings of the UM 2001 workshop
 [6] D. Das and H. ter Horst. Recommender systems for                                                                         ”Personalization in Future TV”, 2001.
     tv. In In Proceedings of AAAI, 1998.