<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SemRevRec: A Recommender System based on User Reviews and Linked Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Iacopo Vagliano</string-name>
          <email>i.vagliano@zbw.eu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diego Monti</string-name>
          <email>diego.monti@polito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maurizio Morisio</string-name>
          <email>maurizio.morisio@polito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Politecnico di Torino</institution>
          ,
          <addr-line>Turin</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ZBW - Leibniz Information Centre for</institution>
          ,
          <addr-line>Economics, Kiel</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <abstract>
        <p>Traditionally, recommender systems exploit user ratings to infer preferences. However, the growing popularity of social platforms has encouraged users to write textual reviews about liked items. These reviews represent a valuable source of non-trivial information that could improve users' decision processes. In this paper we propose a novel recommendation approach based on the semantic annotation of entities mentioned in user reviews and on the knowledge available in the Web of Data. We compared our recommender system with two baseline algorithms and a state-of-the-art Linked Data based approach. Our system provided more diverse recommendations with respect to the other techniques considered, while obtaining a better accuracy than the Linked Data based method.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Currently, most of recommender systems exploit user ratings to
infer preferences, although the growing popularity of social and
e-commerce websites has encouraged users to write textual reviews.
These reviews enable recommender systems to represent the
multifaceted nature of users’ opinions and build a fine-grained preference
model, which cannot be obtained from overall ratings [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>In this paper we describe how the information extracted from
user reviews, combined with Linked Data, can be exploited in
recommendation tasks. On one side, Linked Data can provide a rich
representation of the items to recommend since they include
interesting features. On the other side, reviews may reveal additional
connections among items: for instance, various reviews of
Interstellar mention Stanley Kubrick, although there is not a direct link
between these two resources in DBpedia. We propose a novel
recommendation approach based on the semantic annotation of reviews
to extract useful information from them. A preliminary ofline study
suggests that our method provides better prediction and ranking
accuracy than another recommender system based on Linked Data,
while it increases the diversity of recommendations with respect to
all the techniques considered.</p>
    </sec>
    <sec id="sec-2">
      <title>APPROACH</title>
      <p>SemRevRec consists of two main modules: semantic annotation
and discovery, and recommendation. The former is responsible for
feeding the recommendation module with semantically annotated
entities and Linked Data, while the latter provides suggestions to
users. The two modules are disconnected: the recommendation
module works online, while the other works ofline and provides
the entities which can be recommended. Every time a new review
is submitted, the system can repeat the semantic annotation and
discovery steps and possibly identify new entities.</p>
      <p>Although our approach is not bounded to a particular domain, in
our implementation, we exploited reviews from IMDb1 because we
focused on movies. We chose DBpedia for annotation and discovery
because it is one of the main datasets in the Web of Data.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Semantic Annotation and Discovery</title>
      <p>The semantic annotation technique associates a URI to the entities
recognized in a given text to add information about their meaning.
In our case, the entities identified in the reviews are resources in the
Web of Data. Thus, the semantic annotation and discovery module
can find other resources that are linked with the annotated entities,
in order to enable our system to recommend more items.</p>
      <p>
        In our implementation, we relied on AIDA [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to annotate
reviews with DBpedia resources. We exploited the DBpedia
properties dbo:starring and dbo:director for discovering, through
SPARQL queries, additional resources that are connected with the
annotated entities. The underling hypothesis is that most of the
entities, if not movies, should be actors or directors. However, these
properties can be configured according to the domain and the
dataset considered. Given the annotated entities, the discoverer
retrieves other relevant entities. This allows the system to discover
other movies from the same director or actor named in a given
review and significantly improve the accuracy of the
recommendations. E.g., if Christopher Nolan was annotated in a review of The
Dark Knight, Interstellar can be found because it is directed by him.
      </p>
      <p>
        The semantic annotation and discovery module stores both
annotated and discovered entities. The URI of each annotated entity
is associated with the URI of the reviewed item and with the
occurrence of that entity in all the reviews of that item. The same entity
may, in fact, appear in reviews regarding diferent items. Similarly,
the URI of each discovered entity is stored together with the URI
of the annotated entity through which it was discovered and their
Linked Data Semantic Distance (LDSD) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], a measure inversely
proportional to the number of links between two resources.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Recommendation</title>
      <p>The recommendation process consists of two main steps: the
generation of the candidate recommendations and their ranking. Given
an initial item, SemRevRec retrieves all the entities related to it.
Firstly, the system selects the annotated entities which were
mentioned in the reviews of the initial item. Afterward, it obtains the
entities which mention the initial item, i.e. entities whose reviews
generated an annotated entity that corresponds to the initial item.
For example, if the initial item is Interstellar and a review of 2001:
A Space Odyssey mention Interstellar, then 2001: A Space Odyssey is
considered as a candidate recommendation.</p>
      <p>Subsequently, SemRevRec retrieves the relevant discovered
entities. These can be entities discovered through the initial item.
For instance, if the initial item is Interstellar and The Dark Knight
was previously discovered because both these movies have been
directed by Christopher Nolan, The Dark Knight is selected. Similarly,
the entities discovered through other entities which were annotated
in the reviews of the initial item are relevant. E.g., if Interstellar is
the initial item, Stanley Kubrick was annotated in one of its reviews,
and 2001: A Space Odyssey was discovered through Stanley Kubrick,
then 2001: A Space Odyssey is a candidate recommendation.</p>
      <p>Finally, SemRevRec ranks the candidate recommendations. The
ranking function (Equation 1) considers the occurrence occurrencei
of entities in the reviews and the Linked Data Semantic Distance
(LDSD) between each discovered entity and the entity through
which it was discovered. This avoids assigning the same value to
all the entities discovered through the same annotated entity. The
item i can be an annotated or a discovered entity. The α coeficient
is 1 if the item i is an annotated entity, while it can be configured
to a custom value for the discovered entities (by default is 0.5). For
the discovered entities, the occurrence of entities through which
they were discovered is used, multiplied by α . To obtain a value
between 0 and 1, the occurrence is normalized with respect to the
maximum occurrence of entities j which belong to the candidate
recommendation set CR. The β coeficient is 1 if i is an annotated
entity, 0.5 otherwise. The γ coeficient is 0.5 for discovered entities,
0 otherwise. In this way, the function returns a number between
0 and 1, which is equal to the first term for the annotated entities,
while, for the discovered entities, it represents the average of the
ifrst term and 1 − LDSD(i, io ), where io is the entity through which
it was discovered.</p>
      <p>α · occurrencei
R(i) = β · maxj ∈CR(occurrencej ) + γ · (1 − LDSD(i, io ))
(1)</p>
    </sec>
    <sec id="sec-5">
      <title>3 EVALUATION</title>
      <p>We evaluated SemRevRec with a preliminary ofline experiment
conducted in the movie domain. Its purpose is to compare our
proposal with a state-of-the-art recommender system based on
Linked Data and two baseline algorithms. We annotated the reviews
available on IMDb for the top-250 movies2. We also relied on the
MovieLens 1M dataset3 for obtaining the actual user ratings.</p>
      <p>
        The evaluation was performed with LibRec4. We executed a
5fold cross-validation considering as positive the ratings greater than
3 on a scale from 1 to 5. Using the top-10 recommendations for each
user, we computed the measures of precision, recall, nDCG,
Entropy Based Novelty (EBN) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and diversity [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We compared our
technique with the Most Popular and the Random Guess baseline
algorithms, and with SPrank [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We configured SPrank to exploit
nDCG
LambdaMart as ranking method and the properties related to the
movie domain (dct:subject, dbo:director, and dbo:starring).
      </p>
      <p>Table 1 lists the results of the experiment. For all the measures
but EBN, higher values mean better results, while the lower is EBN,
the higher is the novelty. SemRevRec provided a better prediction
accuracy and ranking than SPrank, while it improved in novelty with
respect to the Most Popular technique. However, SPrank obtained
a higher novelty than SemRevRec. The diversity of the algorithms
was similar, but our system resulted in the best diversity.</p>
    </sec>
    <sec id="sec-6">
      <title>4 CONCLUSIONS AND FUTURE WORK</title>
      <p>In this paper we proposed a novel recommendation approach based
on the semantic annotation of reviews to extract information as
Linked Data. Our method discovers additional resources and
generates recommendations by exploiting the annotated entities. A
preliminary ofline study conducted in the movie domain suggested
that our algorithm provides better prediction accuracy and ranking
than another method based on Linked Data, while it increases the
diversity of recommendations with respect to the other techniques
considered. Although we have tested our approach in only one
domain, we could apply it to others, provided the reviews. As future
work, we plan to evaluate SemRevRec in other domains, such as
music and books, and also consider, during ranking, the sentiment
and the linking confidence associated with the annotated entities.</p>
    </sec>
    <sec id="sec-7">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work was supported by the EU’s Horizon 2020 programme
under grant agreement H2020-693092 MOVING.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Alejandro</given-names>
            <surname>Bellogìn</surname>
          </string-name>
          , Ivàn Cantador, and
          <string-name>
            <given-names>Pablo</given-names>
            <surname>Castells</surname>
          </string-name>
          .
          <article-title>A Study of Heterogeneity in Recommendations for a Social Music Service</article-title>
          .
          <source>In Proceedings of the 1st International Workshop on Information Heterogeneity and Fusion in Recommender Systems</source>
          (
          <year>2010</year>
          )
          <article-title>(HetRec '10)</article-title>
          . ACM, 1-
          <fpage>8</fpage>
          . https://doi.org/10.1145/1869446.1869447
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Li</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Guanliang</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Feng</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Recommender systems based on user reviews: The state of the art</article-title>
          .
          <source>User Modeling and User-Adapted Interaction 25</source>
          ,
          <issue>2</issue>
          (
          <year>2015</year>
          ),
          <fpage>99</fpage>
          -
          <lpage>154</lpage>
          . https://doi.org/10.1007/s11257-015-9155-5
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Hofart</surname>
          </string-name>
          , Mohamed Amir Yosef, Ilaria Bordino, Hagen Fürstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and
          <string-name>
            <given-names>Gerhard</given-names>
            <surname>Weikum</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Robust Disambiguation of Named Entities in Text</article-title>
          .
          <source>In Conference on Empirical Methods in Natural Language Processing, EMNLP</source>
          <year>2011</year>
          , Edinburgh, Scotland.
          <fpage>782</fpage>
          -
          <lpage>792</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Tommaso</given-names>
            <surname>Di</surname>
          </string-name>
          <string-name>
            <surname>Noia</surname>
          </string-name>
          , Vito Claudio Ostuni, Paolo Tomeo, and Eugenio Di Sciascio.
          <year>2016</year>
          .
          <article-title>SPrank: Semantic Path-Based Ranking for Top-N Recommendations Using Linked Open Data</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology 8</source>
          ,
          <issue>1</issue>
          (
          <year>2016</year>
          ),
          <volume>9</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          :
          <fpage>34</fpage>
          . https://doi.org/10.1145/2899005
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Alexandre</given-names>
            <surname>Passant</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>dbrec - Music Recommendations Using DBpedia</article-title>
          .
          <source>In The Semantic Web - ISWC 2010</source>
          . Springer Berlin Heidelberg,
          <fpage>209</fpage>
          -
          <lpage>224</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Mi</given-names>
            <surname>Zhang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Neil</given-names>
            <surname>Hurley</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Avoiding Monotony: Improving the Diversity of Recommendation Lists</article-title>
          .
          <source>In Proceedings of the 2008 ACM Conference on Recommender Systems (RecSys '08)</source>
          . ACM, New York, NY, USA,
          <fpage>123</fpage>
          -
          <lpage>130</lpage>
          . https://doi.org/10.1145/1454008.1454030
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>