<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Top-N Book Recommendations Using Wikipedia</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nitish Aggarwal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kartik Asooja</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jyoti Jha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Buitelaar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Insight Centre for Data Analytics National University of Ireland Galway</institution>
          ,
          <addr-line>Ireland IIIT Hyderabad</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents an approach of recommending a ranked list of books to a user. A user pro le is de ned by a few liked and disliked books. To recommend a book, we calculate semantic relatedness of the given book to the liked and disliked books by using Wikipedia. Based on the obtained scores, we predict ratings of the book. We evaluate our approach on a dataset that consists of 6,181 users, 8,171 books and 67,990 user-item pairs to predict the rating.</p>
      </abstract>
      <kwd-group>
        <kwd>Top N Recommendations</kwd>
        <kwd>Semantic Relatedness</kwd>
        <kwd>Wikipedia</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Wikipedia provides a valuable source of background knowledge about millions
of entities such as movies, actors, places and books. This knowledge can be
exploited to build the Top-N recommendation system which deals with nding a
set of N items that best match a user pro le. The user pro le can be de ned
by liked and disliked items. We assume that user might like the items that are
similar or related to his/her liked ones. Therefore, the recommendation task can
be re-modeled as to nd out a ranked list of items which are more related to the
liked items than the disliked ones.</p>
      <p>
        In recent years, there have been several e orts in utilizing external knowledge
bases such as Wikipedia and DBpedia 1 for recommendation. Most of the work
focuses on boosting collaborative ltering approach or to improve content-based
systems [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In particular they have shown some bene ts in solving cold start
and data sparsity issues in conventional collaborative ltering methods. Ostuni
et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] have shown the e ectiveness of using Linked Open data in boosting
collaborative ltering method for Top N movies recommendation.
In this paper, we present Top-N books recommendation system that calculates
semantic relatedness of a given book to the liked and disliked books. Wikipedia
contains information about thousands of books and their authors. Every book
can be seen as a Wikipedia entity. Therefore, in order to compute the
semantic relatedness scores between two books, we can calculate the relatedness score
1 http://wiki.dbpedia.org/
between their corresponding Wikipedia entities (articles). We use
Wikipediabased Distributional Semantics for Entity Relatedness (DiSER) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to calculate
the relatedness score between two books to perform book recommendations [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
DiSER calculates the relatedness scores by building distributional vector over
Wikipedia articles. However, Aggarwal and Buitelaar [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] have shown a
significant improvement over other existing methods of computing entity relatedness
such as ESA [
        <xref ref-type="bibr" rid="ref1 ref5">1, 5</xref>
        ] and KORE [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], we also perform experiments with those other
methods for Top N books recommendation task in this paper.
2
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Approach</title>
      <sec id="sec-2-1">
        <title>Top N Recommendation</title>
        <p>User pro le is de ned by a few liked and disliked books. The task is to nd out a
ranked list of N other books that a user might like. We compute the relatedness
scores of a given book with the liked and disliked books. We recommend the
book only if the score for like prediction is greater than dislike prediction. Since
user can like or dislike more than one book, we need to aggregate the relatedness
scores to obtain nal con dence scores for like and dislike predictions. Therefore,
we use three methods to aggregate the relatedness scores.</p>
        <p>Average: We calculate relatedness scores of the given book with all the books
liked by the user, and obtained a con dence score by taking an average of these
scores. Similarly, we calculate the con dence score for disliked items.
Maximum: We calculate relatedness scores of the given book with all the books
liked by the user. Unlike to the Average case, we choose the relatedness score
of the most related pair as the con dence score. Similarly, we calculate the
con dence score for disliked items.</p>
        <p>Random: We randomly select one book from all the books liked by the user, and
one from all the disliked ones. We calculate relatedness scores of the given book
with the randomly selected liked and disliked books. The obtained relatedness
scores are considered as the con dence scores for the corresponding classes.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Computing Semantic Relatedness</title>
        <p>DiSER generates a high dimensional vector by taking every Wikipedia article as
a dimension, and considers the associativity weight of an entity with the article
as the magnitude of the corresponding dimension. To measure the semantic
relatedness between two entities, it computes the cosine score between their
corresponding DiSER vectors.</p>
        <p>DiSER retrieves a list of relevant Wikipedia articles and rank them according
to their relevance scores with the given entity. It considers only the human
annotated entities in Wikipedia, thus keeping only the canonical entities that
appear with hyperlinks in Wikipedia articles. The tf-idf weight of an entity with
every Wikipedia article is calculated and used to build a semantic vector. The
semantic vector of an entity is represented by the retrieved Wikipedia concepts
sorted by their tf-idf scores. For instance, there is an entity e, DiSER builds a
semantic vector v, where v = PiN=0 ai ci and ci is ith concept in the Wikipedia
concept space, and ai is the tf-idf weight of the entity e with the concept ci.
Here, N represents the total number of Wikipedia articles.
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation</title>
      <sec id="sec-3-1">
        <title>Dataset</title>
        <p>In order to evaluate our approach, we perform experiments on a dataset provided
in \Linked Open Data-enabled Recommender Systems"2 challenge. There were
three di erent tasks, where task 2 was \Top-N recommendation from binary user
feedback". The dataset consists of 6,181 users, 8,171 books and 67,990 user-item
pairs to predict the rating. All the books contain their corresponding DBpedia
and Wikipedia links.</p>
        <p>Entity
Relatedness</p>
        <p>ESA
Context-VSM</p>
        <p>DiSER
Average</p>
        <p>ESA
Maximum Context-VSM</p>
        <p>DiSER</p>
        <p>Precision Recall</p>
        <p>
          F1
We performed experiments with three di erent relatedness measures: ESA,
ContextVSM and DiSER. Similar to DiSER, ESA computes the relatedness score by
taking distance between two high dimensional vectors built over Wikipedia.
However, unlike DiSER, it does not perform any speci c feature selection for
entity relatedness. Thus, it considers only the surface form of an entity and do
not di erentiate between two entities with the same surface forms. For instance,
ESA builds the same vector for \Harry Potter ( lm series)" and \Harry
Potter (book series)" as it generates the vector for their surface form i.e. \Harry
Potter". We compute ESA score between the book titles. Context-VSM is
similar to KORE [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] that computes key-phrase overlap between the contents of the
2 http://challenges.2014.eswc-conferences.org/index.php/RecSys
corresponding Wikipedia articles. We performed experiments with these three
relatedness measures for our recommendation approach by using the above
mentioned three aggregation methods: Average, Maximum and Random.
3.3
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>We presented our approach of Top N books recommendation. We reported the
results of three di erent relatedness measures. DiSER outperformed other two
methods of computing relatedness scores. Further, we showed that random
aggregation achieves comparable scores. Thus, we can conclude that Wikipedia is a
valuable resource for obtaining the recommendation of popular books in a
coldstart scenario. Future work will include the investigation of other relatedness
measures to boost the conventional recommendation methods like collaborative
ltering.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>N.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Asooja</surname>
          </string-name>
          , G. Bordea, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Buitelaar</surname>
          </string-name>
          .
          <article-title>Non-orthogonal explicit semantic analysis</article-title>
          .
          <source>Lexical and Computational Semantics (* SEM</source>
          <year>2015</year>
          ),
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>N.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Asooja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ziad</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Buitelaar</surname>
          </string-name>
          .
          <article-title>Who are the american vegans related to brad pitt?: Exploring related entities</article-title>
          .
          <source>In Proceedings of the 24th International Conference on World Wide Web Companion</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>N.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Buitelaar</surname>
          </string-name>
          .
          <article-title>Wikipedia-based distributional semantics for entity relatedness</article-title>
          .
          <source>In 2014 AAAI Fall Symposium Series</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>N.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Blanco</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Buitelaar</surname>
          </string-name>
          .
          <article-title>Insights into entity recommendation in web search</article-title>
          .
          <source>In Proceedings of the Intelligent Exploration of Semantic Data, ISWC</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>E.</given-names>
            <surname>Gabrilovich</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Markovitch</surname>
          </string-name>
          .
          <article-title>Computing semantic relatedness using wikipediabased explicit semantic analysis</article-title>
          .
          <source>In IJCAI'07</source>
          , pages
          <fpage>1606</fpage>
          {
          <fpage>1611</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. J. Ho art, S. Seufert,
          <string-name>
            <given-names>D. B.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Theobald</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Weikum</surname>
          </string-name>
          . Kore:
          <article-title>Keyphrase overlap relatedness for entity disambiguation</article-title>
          .
          <source>In Proceedings of the 21st ACM international conference on Information and knowledge management</source>
          , pages
          <volume>545</volume>
          {
          <fpage>554</fpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>V. C.</given-names>
            <surname>Ostuni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Di</given-names>
            <surname>Noia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Di</given-names>
            <surname>Sciascio</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Mirizzi</surname>
          </string-name>
          .
          <article-title>Top-n recommendations from implicit feedback leveraging linked open data</article-title>
          .
          <source>In 7th ACM RecSys</source>
          , pages
          <volume>85</volume>
          {
          <fpage>92</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>G.</given-names>
            <surname>Semeraro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lops</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          , and M. de Gemmis.
          <article-title>Knowledge infusion into content-based recommender systems</article-title>
          .
          <source>In 3rd ACM RecSys</source>
          , pages
          <volume>301</volume>
          {
          <fpage>304</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>