<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Time-Aware Semantic enriched Recommender Systems for movies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marko Harasic</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierre Ahrendt</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexandru Todor</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adrian Paschke</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>FU Berlin</string-name>
          <email>F@1</email>
        </contrib>
      </contrib-group>
      <abstract>
        <p>With the World Wide Web moving from passive to active, the role of recommender systems as an aid to make decisions play a very prominent role. This enables its users to nd new items of high personal interest, which they were previously unaware of. While traditional approaches have shown the generation of high quality recommendations, the additional use of background knowledge to describe the items and their preferences on a more granular level is still lacking. Furthermore, these approaches do not take into consideration the contextual information, wherein the dimension 'time' plays a signi cant role. In this paper, we propose a new approach for recommending movies, which semantically enriches the process of generating recommendations by using a taxonomy derived out of di erent data sources from the LOD-Cloud. Furthermore, the paper also addresses the interplay between the rating behavior of the users and the dimension 'time'.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The amount of information in the world wide web grows continuously, causing
users to be overwhelmed by the sheer volume of data. Users need a mechanism
which aid them in their decisions when choosing the most useful item.
Recommender systems collect their actions and then infer on their preferences. The
system then generates an internal user representation, the user pro le. With a
search space reduction based his preferences, items of low personal interest will
gradually be removed from the displayed items, improving his search results and
simplifying the navigation.</p>
      <p>
        Adomavicus et. al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] established a commonly used classi cation of
recommender according to their approach. They distinguished them into content based
ltering, into collaborative ltering and into hybrid recommender. Content based
ltering (CBF) analyse items in advance, their attributes are extracted and a
representation of the item is generated [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Afterwards, those items are
recommended, which are most similar to the highest rated items of the user. In opposite
to the item similarity of CBF, users with most similar preferences compared to
the actual user form the neighbourhood in Collaborative ltering (CF). Items are
then drawn from these users [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Hybrid recommenders combine the two types
of recommenders into a new system [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. One approach is"collaboration via
content" [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. User pro les don't consist of ratings on items, they contain the item
attributes. Especially when combined with background knowledge e.g. given by a
taxonomy of attributes, hybrid recommenders outperform the other systems [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
A special type is the semantic recommender, which uses background knowledge
derived from taxonomies described in semantic web languages [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Recommendations based on traditional algorithms don't consider any
contextual information like date, place, companion and mood. Considering the di erent
contextual aspects, the dimension 'time' can be identi ed as the most important
one [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Obviously, as the users preferences change, the 'time' information allows
to track the evolution of his habits. Ding and Li proposed an exponential decay
rate on the ratings [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Older ratings are considered as less signi cant, giving
more recent ratings a higher importance. These newer ratings should re ect the
preferences of an user in a higher degree.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Approach</title>
      <p>The proposed algorithm follows the "collaboration via content" approach. User
pro les contain the preferred attributes, derived from the item representations.
They are further extended by using a attribute-taxonomy, which assigns the
attributes to appropriate super-class relations, hence allowing to regard
connections between di erent but related attributes. An exponential dampening factor
is applied, to re ect the generality of attributes nearer to the root. Following
the assumption, that old ratings have a lesser in uence on the preferences than
newer ones, they are weighted less by applying a decay rate.</p>
      <p>The approach is formally described as follows: Let U be the set of users,
I the set of items and R the systems rating scale as an totally ordered set of
values. Then ut : U I ! R is the utility function, that calculates the
usefulness of a single item i to the user u. In order to create a representation
A(i) of an item i 2 I, its direct attributes D(i) := faj jaj 2 T g are
determined and weighted by the function w(i; aj ), which speci es the signi cance
of an attribute and follows the T F IDF metric. Beside the direct attributes
aj , the representation contains their indirect attributes sad(aj ). This
hierarchy of attributes forms the attribute-taxonomy T . Based on its structure, the
path pa(aj ) = (sa0(aj ) sad(aj ) ; ra) for the attribute sa0(aj ) aj to the
root ra is determined. Each attribute is weighted according to its distance d
from aj by the height-function h(sa(aj )) = d , whereby the hierarchy in
uence parameter &gt; 0 adjusts the in uence of indirect attributes. To re ect the
di erent weighting schema of direct and indirect attributes, the weighting
function w(i; aj ) takes possible multiple occurrences of the indirect attributes into
account. The representation A(i) of an element i is then de ned as:
A(i) := f(aj ; w(i; aj ))g [</p>
      <p>[
aj2D(i)</p>
      <p>f(saj ; w(i; saj ))jsaj 2 pa(aj )g
with w(i; a) =
8 1
&lt; jA(i)j</p>
      <p>P
: aj2D(i)^a2pa(aj)
log n(a)
jIj</p>
      <p>if a is direct attribute
h(a) w(i; aj ) otherwise
with n(aj ) = is the number of items, containing the attribute aj
(1)
(2)
Based on the item representations A(i) of the users rated items I(u), his pro le
p(u) := f(aj ; pr(u; aj ))jaj 2 A(i) ^ i 2 I(u)g can be constructed. It contains
the items attributes aj and their preference weights pr(u; aj ; t), which show the
degree of interest on it at the point in time t. Pro les are generated in an iterative
way. Starting with the oldest rated item A(i)0, it is lled with attributes and
their weights. Only those items are taken into account, which are rated higher
than the mean of the systems rating scale. In order to re ect the preference
changes over time, the time decay factor 0 1 is applied to the contained
attributes when a new item A(i)t) is rated at a later point at time t.
pr(u; aj ; t) =
(1
(1
) pr(u; aj ; t
) pr(u; aj ; t
1) +
1)
w(i; j) if aj 2 A(i)t
otherwise
(4)</p>
      <p>By applying the Pearson correlation between the user pro les, the similarity
sim(u; u0) of two users u and u0 can be calculated. Hereby are the preference
weights pr(u; a) of the common attributes ca(u; u0) := f j
a a 2 p(u) \ p(u0)g the
used variables for the correlation. The prediction ut(u; i) for a single item i is
de ned as the weighted sum of the k most similar users u0, which have rated i,
their similarities sim(u; u0) to u and their ratings ru;i on i. By using ut(u; i), the
recommendation set RS(u; N ) for a user u can be generated. It contains the N
unseen items, which has the highest predicted rating.
(5)
(6)
(7)</p>
      <p>P
a2ca(u;u0)</p>
      <p>P
a2ca(u;u0)
P
(pr(u; a)
pr(u)) (pr(u0; a)</p>
      <p>pr(u0))
(pr(u; a)
pr(u))2 r</p>
      <p>P
a2ca(u;u0)
(pr(u0; a)</p>
      <p>pr(u0))2
ut(u; ij+1g
sim(u; u0) = r</p>
      <p>sim(u; u0) ru0;i
ut(u; i) = u02N(u;k)</p>
      <p>P ksim(u; u0)k
u02N(u;k)</p>
      <p>RS(u; N ) := fiijii 2 InI(u) ^ ut(u; ij )
3</p>
    </sec>
    <sec id="sec-3">
      <title>Data</title>
      <p>The MovieLens project provides a dataset, consisting of 3.500 users, who rated
6.000 movies with 1.000.000 ratings and was enriched with URIs, which identify
the movies in DBpedia. A top-down approach starting with the DBPedia URI
for Category:Film and going down the tree via skos:broader of until it reached
the categories assigned to the movies was applied. The created taxonomy T
consisted of 3.804 direct attributes a of the movies and furthermore 691 indirect
attributes sa while having a height h of 12.</p>
      <p>Other knowledge bases such as Freebase extract information form multiple
data sources. One interesting aspect of Freebase is the way it categorizes lms
into genres by the property lm. lm genre. The lm.genre taxonomy contains a
total of 700 genres which are related to each other via child genre relations. In
order to use the freebase lm genres in our approach we, rst had to extract all
the genres and their relations, and then proceeded to infer a genre taxonomy
where the genres are related to each other via subclass relations. This taxonomy
contains 700 direct attributes a assigned to the movies and 238 indirect attributes
sa which are contained inside the taxonomy T with height h of 4.</p>
      <p>The DBpedia taxonomy contains more distinct attributes due to the higher
number of lm related categories in comparison to the genres in Freebase.
Furthermore the Freebase taxonomy is rather shallow in comparison to the DBpedia
taxonomy. The number of distinct indirect attributes represents the nodes that
are not present in the direct item description but are generated due to taking
into account the superclasses of those categories/genres. For example for movie
that has cyberpunk as a genre we would also take sci- into account as a genre
due to it being the superclass of cyberpunk.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>
        To evaluate the system, a 5-fold cross validation was performed with each set
containing 1.200 users. Each of the 5 evaluation runs use 4 di erent training sets
Ut and the remaining set Ue is the test set. The system was rst evaluated to nd
the optimal values for k, and , using the F1@N metric [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. To determine the
accuracy for the system, the mean value of F1(u)@N for all users from the test
set u 2 Ue is calculated by comparing the items of RS(u; N ) with the relevant
ones Ru (i.e rated higher than the average) for the user u. reliu is the binary
relevance value of item i for user u.
      </p>
      <p>
        As baseline algorithm, the widely used item-to-item collaborative ltering
algorithm using the adjusted cosine similarity [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] was implemented. While the
dbpedia recommender used a taxonomy for enriching the pro les, the other
systems only use the attributes taken from dbpedia respectively freebase without a
taxonomy. Each of the following gures show the accuracy of the system in
regard of the evaluated parameter as well as the accuracy of the baseline algorithm.
All results are for recommendation sets with the size N = 5
attribute CF
0:26
0:3
      </p>
      <p>0:26
50</p>
      <p>100
neighborhood size k
0:2
0:4
0:6</p>
      <p>0:8
time decay</p>
      <p>Di erent neighbourhood sizes k have a signi cant in uence on the accuracy.
Figure 1 shows the in uence of k. Thereby achieved the di erent approaches
their highest accuracy at k 120. Smaller values for k result into a degradation
of the accuracy. If the neighbourhood is to small, the few contained users have a
strong impact on the predictions. Larger neighbourhoods don't have an in uence
on the accuracy. For the taxonomic approach was the value of k higher, because
more users with shared general preferences can be considered, hence achieving
a higher accuracy.
0:31
0:2
0:4
0:6</p>
      <p>0:8
hierarchie in uence
Fig. 3. Impact of hierarchy in uence
on the accuracy of the recommendations
0:32
0:3
0:26</p>
      <p>0:32
0:28
0:29</p>
      <p>0:26</p>
      <p>According to the height h of the used taxonomy, has to be adjusted
accordingly. Figure 3 shows its impact on the accuracy of the system. While to
small and to large values lowers the accuracy, as optimal value could = 0:6 be
determined. When is chosen to low, the additional information gain by the
taxonomy has a negligible e ect and the system behaves similar to the concept only
recommender. The taxonomy enables to nd users, which has related preferences
in movie genres, e.g. in dystophic sci- and in cyberpunk. But if the value for
is set to large, the concepts nearer the root gain a higher in uence resulting in a
loss of precision in the preference representation of the users. Thereby users are
treated equally, even if they are only loosely connected.</p>
      <p>Figure 4 shows the accuracy according of the used F1@5 metric. Each
algorithm used the parameters for k, alpha and , where it achieved its highest
accuracy. Since both tononomix approaches behaved similar according to the
time in uence, its value was set to = 0:7 and for the attribute only
recommender = 0:1. For the taxonomy using system was = 0:6, and k = 120.</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and future work</title>
      <p>The paper proposed a new approach for semantically enriching the process of
recommending items by using a taxonomy derived out of the LOD-cloud. It
outperforms the baseline algorithm to a signi cant level. In addition to this, there
was a positive in uence of the taxonomy on the accuracy of recommendation. As
the DBpedia-recommender uses the same concepts as the attribute-recommender
and follows the same paradigm, its accuracy is signi cantly increased by the use
of the taxonomy. In a nutshell, the proposed approach seems to be well suited
to work with the structure, given by the DBpedia-category taxonomy.</p>
      <p>Following to our assumption, the degradation of each users' preference over
his usage period has an positive impact on the accuracy. But some preferences
tend to exist in present, even if they rst were captured at the beginning.
Therefore, the degradation of preferences has to be considered individually for each
user and each of his preference. A deeper analysis on the in uence of the aspect
'time' on the proposed approach falls out of scope of this paper and is addressed
independently in the forthcoming papers.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Gediminas</given-names>
            <surname>Adomavicius</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Tuzhilin</surname>
          </string-name>
          .
          <article-title>Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions</article-title>
          .
          <source>IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING</source>
          ,
          <volume>17</volume>
          (
          <issue>6</issue>
          ):
          <volume>734</volume>
          {
          <fpage>749</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Robin</surname>
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Burke</surname>
          </string-name>
          .
          <article-title>Hybrid web recommender systems</article-title>
          .
          <source>In The Adaptive Web</source>
          , pages
          <volume>377</volume>
          {
          <fpage>408</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Yi</given-names>
            <surname>Ding</surname>
          </string-name>
          and
          <string-name>
            <given-names>Xue</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Time weight collaborative ltering</article-title>
          .
          <source>In Proceedings of the 14th ACM International Conference on Information and Knowledge Management</source>
          ,
          <source>CIKM '05</source>
          , pages
          <fpage>485</fpage>
          {
          <fpage>492</fpage>
          , New York, NY, USA,
          <year>2005</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Jonathan L. Herlocker</surname>
          </string-name>
          , Joseph A.
          <string-name>
            <surname>Konstan</surname>
          </string-name>
          , Loren G. Terveen, and John T. Riedl.
          <article-title>Evaluating collaborative ltering recommender systems</article-title>
          .
          <source>ACM Trans. Inf</source>
          . Syst.,
          <volume>22</volume>
          :5{
          <fpage>53</fpage>
          ,
          <year>January 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Marius</given-names>
            <surname>Kaminskas</surname>
          </string-name>
          and
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Ricci</surname>
          </string-name>
          .
          <article-title>Contextual music information retrieval and recommendation: State of the art and challenges</article-title>
          .
          <source>Computer Science Review</source>
          ,
          <volume>6</volume>
          (
          <issue>2</issue>
          -3):
          <volume>89</volume>
          {
          <fpage>119</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Ken</given-names>
            <surname>Lang</surname>
          </string-name>
          .
          <article-title>Newsweeder: Learning to lter netnews</article-title>
          .
          <source>In ICML</source>
          , pages
          <volume>331</volume>
          {
          <fpage>339</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Michael</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pazzani</surname>
          </string-name>
          .
          <article-title>A framework for collaborative, content-based and demographic ltering</article-title>
          .
          <source>Artif. Intell. Rev.</source>
          ,
          <volume>13</volume>
          (
          <issue>5-6</issue>
          ):
          <volume>393</volume>
          {
          <fpage>408</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>E.</given-names>
            <surname>Peis</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. M.</surname>
          </string-name>
          <article-title>Morales del Castillo, and</article-title>
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Delgado-Lopez</surname>
          </string-name>
          .
          <article-title>Semantic recommender systems. analysis of the state of the topic</article-title>
          .
          <source>Hipertext.net, 6:(online)</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Paul</given-names>
            <surname>Resnick</surname>
          </string-name>
          , Neophytos Iacovou, Mitesh Suchak,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Bergstrom</surname>
          </string-name>
          ,
          <string-name>
            <surname>and John Riedl. Grouplens:</surname>
          </string-name>
          <article-title>An open architecture for collaborative ltering of netnews</article-title>
          .
          <source>In CSCW</source>
          , pages
          <volume>175</volume>
          {
          <fpage>186</fpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Badrul</surname>
            <given-names>Sarwar</given-names>
          </string-name>
          , George Karypis, Joseph Konstan,
          <string-name>
            <given-names>and John</given-names>
            <surname>Reidl</surname>
          </string-name>
          .
          <article-title>Item-based collaborative ltering recommendation algorithms</article-title>
          .
          <source>In Proceedings of the 10th international conference on World Wide Web, WWW '01</source>
          , pages
          <fpage>285</fpage>
          {
          <fpage>295</fpage>
          , New York, NY, USA,
          <year>2001</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Cai-Nicolas</surname>
            <given-names>Ziegler</given-names>
          </string-name>
          , Georg Lausen, and
          <string-name>
            <surname>Lars</surname>
          </string-name>
          Schmidt-Thieme.
          <article-title>Taxonomy-driven computation of product recommendations</article-title>
          .
          <source>In CIKM</source>
          , pages
          <volume>406</volume>
          {
          <fpage>415</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>