<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Co-views Information to Learn Lecture Recommendations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Haibin Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sujatha Das</string-name>
          <email>gsdas@cse</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dongwon Lee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prasenjit Mitra</string-name>
          <email>pmitra@ist</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>C. Lee Giles</string-name>
          <email>giles@ist</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The Pennsylvania State University</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>71</fpage>
      <lpage>82</lpage>
      <abstract>
        <p>Content-based methods are commonly adopted for addressing the cold-start problem in recommender systems. In the cold-start scenario, usage information regarding an item and/or item preference information of a user is unavailable since the item or the user is new in the system. Thus collaborative filtering strategies cannot be employed but instead item-specific attributes or the user profile information are used to make recommendations. We focus on lecture recommendations for the data in videolectures.net that was made available as part of the ECML/PKDD Discovery Challenge. We propose the use of co-view information based on previously seen lecture pairs for learning the weights of lecture attributes for ranking lectures for the cold-start recommendation task. Co-viewed triplet and pair information is also used to estimate the probability that a lecture would be seen, given a set of previously seen lectures. Our results corroborate the effectiveness of using co-view information in learning lecture recommendations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Given a set of users and a set of items, the goal of a recommender system is to predict
the items a particular user is most likely to be interested in. Recommending products
for users on a shopping website like Amazon, predicting the ratings that a user is likely
to assign to a movie, predicting the citations a paper is likely to make are some common
scenarios where automatic recommender systems are desirable[
        <xref ref-type="bibr" rid="ref11 ref14 ref4">11, 4, 14</xref>
        ] 1.
      </p>
      <p>
        We focus on lecture recommendations for lectures from videolectures.net, an
openaccess repository of educational lectures 2. Lectures given by prominent researchers and
scholars at conferences and other academic events are made available on this website
for educational purposes. This year’s ECML/PKDD Discovery Challenge involved two
recommendation tasks using lectures from this website. Figure 1 denotes a snapshot
of the existing system at videolectures.net. We indicate in this figure some of the
information available with lectures on this website. Most lectures on this website contain
information on the language in which the lecture was given, content of the slides, the
category (discipline-area) of the lecture etc. Sometimes, additional information such as
the description of the event (such as conference, workshop) in which the lecture was given
and author affiliation is also available. The training data for ECML/PKDD challenge
contains a subset of lectures from videolectures.net. Along with the lecture, authors
and event attribute information, the training data includes information about pairs and
triples of lectures that were frequently co-viewed in the past. The datasets are described
in more detail in Section 3 and [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Task 1 of the challenge pertains to the cold-start scenario where recommendations
are sought for new lectures. In this task, we are given a set of training lectures Q, and a
co-viewed pairs set P = {(l1, l2, f ) | l1, l2 ∈ Q}, where f is the frequency that l1, l2 were
co-viewed together. The test set, T contains lectures without any viewing history and
the task requires the participants to recommend lectures, Rq ⊂ T for each query lecture
q ∈ Q′, Q′ ⊂ Q. This task simulates the scenario in which recommendations are to be
made for a new user or a new lecture where no co-viewed history information is available.</p>
      <p>Task 2 of the challenge simulates a typical scenario for recommender systems. For this
task, recommendations are sought on what lectures are likely to be viewed next given
three lectures of a stream of previously viewed lectures. The training data for this task
includes triplets Tleft = {(tid, l1, l2, l3, fl) | l1, l2, l3 ∈ Q} where Q is the set of lectures,
fl is the frequency that l1, l2, l3 appeared together in click streams and tid an identifier
for the triplet. The data also includes the set of lectures that have the highest co-view
frequencies, Tright = {(tid, l, fr) | l ∈ Q} where fr is the co-view frequency of the lecture l
with the triple tid. Given a list of query triplets Tlqeufetry, task 2 involves predicting lectures
that are most likely to be viewed next given that each lecture in the triplet t ∈ Tlqeufetry
was seen.</p>
      <p>Our solutions to task 1 and 2 make use of the lecture co-view information available in
the training data. We adopt a content-based approach for the cold-start scenario of task 1
where co-view information is used to learn the feature weights for ranking lectures for the
recommendation task. We use the co-viewed lecture pairs to form training instances for
a supervised learning setup. Support Vector Machines were used where the learnt feature
weights indicate the importance of each lecture attribute for recommending lectures in
the cold-start scenario. For task 2 that involves making recommendations based on a
set of previously seen lectures, we propose a scoring technique to estimate the likelihood
that a lecture would be seen next using concepts from item-set mining. Our solutions
based on the above strategies performed on par with the top-performing systems in the
Discovery challenge. In the final rankings on the leader board, our system was ranked
8th among 62 participants for task 1 and 4th among 22 participants for task 2.</p>
      <p>The remainder of this paper is organized as follows: We briefly summarize previous
work most related to our approach in Section 2. Section 3 describes our solution and
experiments related to task 1 where as in Section 4 we describe our algorithm for task 2
and its performance. Section 5 concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Recommendation strategies can be broadly classified into collaborative filtering and
content-based strategies. We briefly describe the basic ideas behind these approaches
and include references to some surveys for further understanding. Collaborative filtering
(CF) methods use previous item-user history to generate lists of recommendations. For
example, in movie recommendations, CF strategies use movie ratings previously
submitted by other users to predict the rating a user might assign to a movie based on
user-similarity or movie-similarity [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In addition to historical information, a user’s or
item’s properties and attributes can be used for personalized recommendations using
content-based approaches [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Content-based methods are common in addressing the
cold start problem where ratings and preference information is unavailable or sparse.
      </p>
      <p>
        Recently, hybrid strategies are being used to leverage the benefits of both collaborative
filtering and content-based strategies. For example, to tackle the cold start problem,
Gantner, et al. used collaborative information to compute similarity between existing
items or users using matrix factorization, and then proposed mapping techniques like a
linear combination of various attributes of new items to fit content into same model [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Our techniques for learning feature weights for content-based recommendations is closest
to the techniques adopted by Strohman, et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and Debnath, et al [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. As opposed to
the regression framework adopted by them, we formulate the attribute-weight learning
problem in a classification framework for cold-start recommendations for task 1. For task
2, we design an algorithm that makes use of the fundamental concepts of support and
confidence from item-set mining [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
3
      </p>
      <p>Task 1: Learning Attribute Weights for Cold Start</p>
      <p>Recommendations
We briefly summarize the attribute information available with the data provided for the
challenge in Table 1.
The pairs information available for task 1 indicates the frequency with which a given
lecture pair was co-viewed. This information is very significant in understanding the
features that a pair of lectures that tend to be co-viewed often share. For instance, it is
reasonable to expect that a highly co-viewed pair of lectures are in the same language
and perhaps in the same category. Similarly, a pair that is co-viewed frequently is likely
to be on related topics such as two lectures presented in the same conference or two parts
of a tutorial on a topic. It is also intuitive to expect the co-view frequencies of lectures
belonging to diverse categories such as Graph Theory and Ecology to be small. Based on
the above intuitions, we designed the set of following features to measure the similarity
between two lectures in terms of their attributes.
1. Co-author similarity This feature indicates whether two lectures have the same
author. It has a value 1 when two lectures share the same author and 0 otherwise.
2. Type similarity This feature has a value 1 when two lectures share the same type
and 0 otherwise. Example lecture types include lecture, keynote, thesis proposal,
tutorial etc.
3. Language similarity has a value 1 when the two lectures are in the same language
and 0 otherwise.
4. Event similarity A value of 0 or 1 indicates whether the two lectures belong to the
same event such as conference, workshop series etc. In addition to using the above
boolean-valued feature, we used the description fields associated with events to
compute a similarity value using the cosine similarity function. This score is meant to
capture events that are similar though not the same. For instance, the conferences
ECML and ICML are related despite being distinct venues since they are both
machine learning conferences. Similarly, lectures belonging to the same conference venue
but presented in different years are related.
5. Category similarity The category information pertains to the subject area
assigned to a lecture. The categories used by videolectures.net are those available
in Wikipedia. Connections between categories are captured via a directed graph in
Wikipedia and can be used to compute similarity. For instance, if two lectures are
assigned the categories “Computer Science” and “Graph Theory”, they share some
commonality since “Graph Theory” is a sub-category of “Computer Science”. To
capture this aspect, we used four different binary indicators for capturing category
similarity between two lectures l1, l2:
– C1: 1 if l1.categories ∩ l2.categories 6= ∅ and 0 otherwise.
– C2: 1 if l1.categories ∩ l2.parent categories 6= ∅ and 0 otherwise.
– C3: 1 if l1.parent categories ∩ l2.categories 6= ∅ and 0 otherwise.</p>
      <p>
        – C4: 1 if l1.parent categories ∩ l2.parent categories 6= ∅ and 0 otherwise.
6. Text similarity The name, titles and description fields of a given lecture have
textual content. We represent these fields using TFIDF [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] vectors and use the cosine
similarity of the corresponding fields of two lectures to compute these features.
7. Topic similarity We use Latent Dirichlet Allocation (LDA) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], a popular tool
used for modeling documents as topic mixtures. The generative process in LDA
expresses each document in terms of its topic proportions. We modeled the training
set of lectures (name+description+titles) using 1000 topics and obtained the topic
proportions for each lecture. Similarity between a pair of lectures can be computed
using the cosine similarity between the topic vectors or by measuring the overlap
among the top topics from each lecture. We used Jaccard Coefficient [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] to compute
the similarity score based on the overlap among the top-10 topics of the two lectures.
8. Affiliation similarity The author affiliation information is also available with
lectures. We compute the affiliation similarity between two affiliations with the Jaccard
similarity measure on the set of words describing the affiliation.
3.2
      </p>
      <sec id="sec-2-1">
        <title>Learning attribute weights for pairwise prediction</title>
        <p>
          Support Vector Machines (SVM) is a discriminative supervised learning approach widely
used for classification and regression problems in several areas. For binary classification
where the set of class labels is restricted to +1 and -1, the SVM learns a maximally
separating hyperplane between the examples belonging to the two classes based on the
training data. During testing, the distance between a given instance and this hyperplane
is computed and used to assign a prediction label. We formulate the recommendation
task for the cold start scenario as a binary classification problem. We treat the co-viewed
lecture pairs available in the training data as positive examples for the classification
problem. Negative instances for training the classifier are obtained by randomly selecting
lecture pairs that were never co-viewed (in the training data). The features described in
Section 3.1 were used to train a SVM classifier. Task 1 includes query lecture ids (from
say, the set Q) for which recommendations are to be predicted from the set of given test
lectures (say, set T ). We used each q ∈ Q to form a pair with each t ∈ T and score the
pair using the trained SVM classifier, namely the distance from this lecture pair instance
to the hyperplane. The final list of predictions for each query is obtained by sorting the
pairs based on these scores and choosing the test lectures corresponding to the top pairs.
When trained with the linear kernel option, SVMs learn a set of weights that satisfy the
maximum number of constraints of the following form imposed by the training data:
yi(w.xi − b) ≥ 1 − ǫi, 1 ≤ i ≤ n
In the above formula, i is the index over the training examples, xi pertains to the features
of a given example, yi its label (+1 or -1) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. In our case, the feature values refer to
similarity values based on different attributes of a given lecture pair. That is, as part of
learning the classifier, we are in effect, learning a scoring function for lecture pairs (li, lj )
based on a linear combination of individual attribute similarity values such that
        </p>
        <p>F
score(li, lj ) = X wf × simf (lfi, lfj)</p>
        <p>f=1
where wf indicates the weight assigned to the similarity value based on a particular
attribute f of the given lecture pair.
3.3</p>
      </sec>
      <sec id="sec-2-2">
        <title>Experiments and Observations</title>
        <p>The challenge uses R-precision variant for evaluating recommendation performance a
mean value over all queries R defined as:</p>
        <p>M ARp =
1
|R|</p>
        <p>X AvgRp(r)
r∈R
Here average R-precision for a single recommended ranked list is given by AvgRp =
Pz∈Z Rp|@Zz|(r) where Rp@z(r) = |relev|arnelte∩vraentrt|izeved|z the R-precision at some cut-off
length z where z ∈ 5, 10, 15, 20, 25, 30 for task 1 and z ∈ 5, 10 for task 2.</p>
        <p>For training the SVM classifier for task 1, from the training set P we filtered out
pairs that occurred with a frequency less than 5% (for either lecture in the pairs): P ′ =
n(l1, l2, f ) | (l1, l2, f ) ∈ P , S(l1) ≥ 0.05 ∨ S(l2) ≥ 0.05o, where S(l1) =
f f</p>
        <p>P f1i,
(l1,li,f1i)∈P
f2j . These were assigned the class label +1. For negative instances,
S(l2) = P</p>
        <p>
          (lj,l2,f2j)∈P
we randomly selected lecture pairs of a comparable size to P ′ that do not appear in the
training pairs set P . In total we had a balanced data set with about 40, 000 pairs for
training the classifier. We used the SVMLight [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] implementation provided by Joachims.
We set the margin-loss penalty parameter C to 10 after experimenting with values
between 0.1-100. The performance on a validation set was the best for C values ranging
between 5-20. To show the stability of feature weights we show their mean and variance
over five-folds of training runs in Table 2.
        </p>
        <p>As shown in the above table the positive weights for some features such as co-author
similarity, event similarity and LDA topic overlap support our intuitions on what
attributes are common in lectures that are co-viewed frequently. The negative weights for
description and slide content similarity is surprising. We reason that this is possibly due
to the fact that a large number of lectures in the training data have empty values for
these fields. Similarity based on the concatenated field combining the name, description
and slide content fields and the name similarity fields have high positive weight values
that are not surprising. Videos belonging to the same event such as lectures from a course
series are likely to share a lot of content similarity in their name fields and are also likely
to be viewed together. For our final run we discarded features with negative weights and
re-trained the classifier based on the remaining features.</p>
        <p>
          The classification setup treated all paired lectures uniformly as positive instances.
However, since it is likely that lectures with higher co-view frequencies are most similar,
we also tried unequal weighing strategies based on co-view frequencies as a ranking or
regression problem using SVM. With similar features in classification, rather than +1
or -1 as class label, we defined different target values based on co-view frequencies of
pairs for regression and ranking setup. For regression setting [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], the target similarity
f
value of a pair instance (l1, l2, f ) is defined as s = S(l1)+S(l2) and normalized later.
In ranking setting [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], for each query lecture video q, the target value is defined as
pairwise preference according to co-viewed frequency, namely, in training set for each
video p paired with q, the larger co-viewed frequency (p, q) has, the higher ranking it
stands. Table 3 shows that our preliminary experiments where a regression and ranking
formulation was adopted performed worse than classification, but further experiments
on understanding this aspect are required.
4
        </p>
        <p>Task 2: Estimating lecture probabilities given a previously
seen set
The task 2 of the ECML/PKDD challenge models the common use-case in recommender
systems where the goal is to estimate what lecture a user is most likely to see given a
set of lectures that were previously seen by him/her. The triplets provided from pooled
sequences in the datasets do not imply an ordering. The task seeks recommendations of
the top ten lectures given that a particular set of three lectures was viewed previously.</p>
        <p>In theory, we could use the setup of task 1 for deriving predictions for task 2 as
well. That is, we can classify the lectures not specified in the set of three (seen) lectures
by forming pairs with each of the three lectures and designing a method to aggregate
individual scores. However, we found this method to not work well on the test dataset
(We obtained a score of 0.123 using this method which is almost three times worse
than our final score). For task 2, the triplets-right and triplets-left tables in the training
data capture the sets of lectures that are commonly seen together using which
tripletlecture pairs information can be derived. This data can be directly used to estimate
the likelihood that a particular lecture will be seen given a set of three lectures. We
describe this estimation with a simple example. Let li refer to a lecture i where as fijkl
corresponds to the frequency of seeing lectures i, j, k and l together. Assume that the
following triplet-lecture pairs information is available from the training data.</p>
        <p>(l1, l2, l3; l4; f1234), (l1, l2, l3; l6; f1236), (l1, l3, l5; l4; f1354), (l2, l5, l6; l1; f2561)
For the above triplet-left-right pairs, we can estimate the number of times the set of
lecture triples (l1, l2, l3), (l1, l2, l4), (l3, l1, l4), (l2, l3, l5) . . . was seen in the training data
by using the associated frequency information. The number of times pairs of lectures are
seen together can also be similarly estimated based on the training data. Given a query
triplet such as (li, lj , lk) and a potential candidate lecture lp, we form the possible triplet
and pair sets:</p>
        <p>(li, lj , lp), (lj , lk, lp), (li, lk, lp), (li, lp), (lj , lp), (lk, lp)
and use the counts estimated based on the training data to compute a score for the
potential candidate lp w.r.t. the given set of seen lectures (li, lj , lk). Clearly, not all
possible pairs and triplets are likely to be found in the training data and a smoothing
strategy is required for cases where triplets and pairs information is unavailable.</p>
        <p>
          Note that the task 2 scenario where potential lectures are to be scored given a set of
three seen lectures (triplet) parallels the item-set mining task in market-basket analysis.
Market-basket analysis involves the estimation of “interestingness” of particular items
given the transaction information of previous purchases. Inspired by this similarity, we
design our score to capture two basic concepts from item-set mining, viz., support and
confidence [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The support of a set X (supp(X )) of items is defined as the proportion
of transactions in the dataset containing the set X whereas the confidence of a rule
conf (X ⇒ Y ) which is interpreted as a probability estimate of seeing the set Y given
that the item set X was seen is defined as suspupp(pX(X∪Y) ) .
4.1
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Algorithm Description</title>
        <p>The pseudo-code for computing the recommendation list for a given query triplet is
described in Algorithm 1. We assume that the auxiliary functions GetT riplets and
GetP airs are available to us. GetT riplets(T, li, lj) returns the set of all lectures lk that
occur with li and lj in the training data T . Similarly, GetP airs(T, li) returns all lectures
that occur with li in T . We start by accumulating all lectures from the training data that
occur with all three pairs of lectures from the query triplet. The aggregate score for a
lecture is obtained by using an aggregator function over the individual confidence values.
We experimented with ‘product’, ‘max’ and ‘sum’ as aggregator functions and found
product to perform the best among those tried. If sufficient number of recommendations
(input parameter) are unavailable, we relax the overlapping criterion by first
considering lectures that occur with any two pairs of lectures from the triplets and finally with
any pair of lectures in the triplet. Algorithm 1 can be directly used with pairs
information from the training data (by obtaining potential lectures using GetP airs instead of
GetT riplets).</p>
        <p>In general, estimation based on triplets is more reliable since it captures the
cooccurrence of a potential lecture with two lectures in the query. This is also illustrated
in one of our experiments. Different strategies for combining scores from GetT riplets
and GetP airs and for smoothing are a subject of future study. The smoothing strategy
mentioned in the pseudo-code uses popular lectures (those with high number of views) as
recommendations when triplets related to query lectures are unavailable in the training
data.
4.2</p>
      </sec>
      <sec id="sec-2-4">
        <title>Observations</title>
        <p>For computing the estimates of lecture triples and pairs for task 2, we used the data
available in the tables triplets train left, triplets train right, task2 query and
pairs of the training data. This information was stored in memory and looked up during
calls to GetT riplets and GetP airs. For each query triplet of lectures in the test set, we
use Algorithm 1 to compute the recommendation list. In case of insufficient number
of desired recommendations, we can use smoothing strategies. We explored the use of
Algorithm 1 with GetP airs and most popular lectures as recommendations as smoothing
strategies.</p>
        <p>In general, we found that the scores computed based on triplets rather than pairs
result in better recommendations. This is not surprising since a lecture that co-occurs
with larger number of lectures in the query triplet would be a better candidate for
recommending. We experimented with sum, max and product as aggregation functions
on the individual confidence values. The performance with triplets, pairs, and other
aggregation functions (using Algorithm 1) is shown in Table 4. We used the best setting,
Algorithm 1 with GetT riplets, with product as the aggregation function for our final
run.</p>
        <p>Although the task description mentions a lack of sequence information among the
lectures of a triplet, based on the description of how the dataset was created, there
seems to be an implicit ordering among the lectures. The scores in Table 4 are obtained
from runs that assume sequence information among the lectures of a query triplet. The
last row shows a run that assumes that the lectures in a query triplet are unordered and
Algorithm 1 Computing Recommendations Using Lecture Triplets
Input: T (set of triplets and their frequencies from training data),</p>
        <p>Query lecture triple q =&lt; l1, l2, l3 &gt;,
k (number of recommendations desired)
Output: R (Recommendation list for q)</p>
        <p>R ← φ
S1 ← GetT riplets(T, l1, l2)
S2 ← GetT riplets(T, l1, l3)
S3 ← GetT riplets(T, l2, l3)
R1 = S1 ∩ S2 ∩ S3 \ {l1, l2, l3}
for all r ∈ R1 do</p>
        <p>score(r) ← AggF unc(conf ({l1, l2} ⇒ r), conf ({l1, l3} ⇒ r), conf ({l2, l3} ⇒ r))
end for
Sort R1 in descending order and append to R
R2 ← ((S1 ∩ S2) ∪ (S1 ∩ S3) ∪ (S2 ∩ S3)) \ (R ∪ {l1, l2, l3})
for all r ∈ R2 do
f1 = f2 = f3 = 1
if r ∈ S1 then</p>
        <p>f1 = conf ({l1, l2} ⇒ r)
end if
if r ∈ S2 then</p>
        <p>f2 = conf ({l1, l3} ⇒ r)
end if
if r ∈ S3 then</p>
        <p>f3 = conf ({l2, l3} ⇒ r)
end if
score(r) ← AggF unc(f1, f2, f3)
end for
Sort R2 in descending order and append to R
R3 ← (S1 ∪ S2 ∪ S3) \ (R ∪ {l1, l2, l3})
for all r ∈ R3 do
if r ∈ S1 then</p>
        <p>score(r) = conf ({l1, l2} ⇒ r)
else if r ∈ S2 then</p>
        <p>score(r) = conf ({l1, l3} ⇒ r)
else if r ∈ S3 then</p>
        <p>score(r) = conf ({l2, l3} ⇒ r)
end if
end for
Sort R3 in descending order and append to R
if |R| &lt; k then</p>
        <p>append to R lectures with top recommendation using pairs until |R| = k
end if
return R
combines all possible orderings into the frequency information. This score being worse
than the other runs hints at the possibility of ordering information among the lectures
of a triplet in the given dataset.
Using the techniques described in Sections 3 and 4, we obtained the M ARp score of
0.2456 for task 1 and 0.4843 for task 2. The top-performing system at the ECML/PKDD
Discovery challenge obtained the scores 0.35857 and 0.62415 on task 1 and task 2
respectively. Our system was ranked 8th out of 62 participating teams for task 1 and 4th out
of 22 participating teams for task 2.</p>
        <p>Since the “correct” predictions on the test data are now available, our current focus
is on improving the performance of our techniques after an error analysis. We need
further study to understand the performance difference between SVM classification and
regression or ranking formulation. Further, in our experiments, we fit a single model
over all lectures in the training data. It is possible that the lectures can be somehow
clustered so that a different model is learnt for each cluster. For task 2, our scoring
function uses simple estimates of confidence based on triplets seen in the training data.
For queries where the required information is missing, back-up options based on content
of the lectures in the query (such as our model in task 1) can be used. Other smoothing
strategies and combination methods also need to be carefully studied.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Adomavicius</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tuzhilin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions</article-title>
          .
          <source>IEEE TKDE</source>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srikant</surname>
          </string-name>
          , R.:
          <article-title>Fast algorithms for mining association rules in large databases</article-title>
          .
          <source>In: VLDB</source>
          (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Antulov-Fantulin</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boˇsnjak</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>Sˇmuc</surname>
          </string-name>
          , T.,
          <string-name>
            <surname>Jermol</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zˇnidarˇsiˇc</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>Grˇcar</surname>
            , M.,
            <given-names>Keˇse</given-names>
          </string-name>
          , P.,
          <string-name>
            <surname>Lavraˇc</surname>
          </string-name>
          , N.: Ecml/pkdd 2011 -
          <article-title>discovery challenge: ”videolectures.net recommender system challenge</article-title>
          . http://tunedit.org/challenge/VLNetChallenge (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Baluja</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seth</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sivakumar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jing</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yagnik</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ravichandran</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aly</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Video suggestion and discovery for youtube: taking random walks through the view graph</article-title>
          .
          <source>In: WWW</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          :
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>J. Mach. Learn. Res</source>
          . (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Burges</surname>
            ,
            <given-names>C.J.C.</given-names>
          </string-name>
          :
          <article-title>A tutorial on support vector machines for pattern recognition</article-title>
          .
          <source>Data Min. Knowl. Discov</source>
          . (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Debnath</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ganguly</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mitra</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Feature weighting in content based recommendation system using social network analysis</article-title>
          .
          <source>In: WWW</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Gantner</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Drumond</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freudenthaler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rendle</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt-Thieme</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Learning Attribute-to-Feature mappings for Cold-Start recommendations</article-title>
          . In: ICDM (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Joachims</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Making large-scale support vector machine learning practical, chap</article-title>
          .
          <source>Support Vector Learning</source>
          , pp.
          <fpage>169</fpage>
          -
          <lpage>184</lpage>
          . MIT Press (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Joachims</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Optimizing search engines using clickthrough data</article-title>
          .
          <source>In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <fpage>133</fpage>
          -
          <lpage>142</lpage>
          . KDD '02,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Linden</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , York, J.: Amazon.
          <article-title>com recommendations: item-to-item collaborative filtering</article-title>
          .
          <source>IEEE Internet Computing</source>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raghavan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Schu¨tze, H.:
          <article-title>Introduction to Information Retrieval (</article-title>
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Pazzani</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Billsus</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Content-Based recommendation systems</article-title>
          .
          <source>In: The Adaptive Web</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Strohman</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croft</surname>
            ,
            <given-names>W.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Recommending citations for academic papers</article-title>
          . In: SIGIR. SIGIR '
          <volume>07</volume>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>