=Paper=
{{Paper
|id=Vol-1959/paper-02
|storemode=property
|title=Knowledge-enabled Recommender Systems: Models, Challenges, Solutions
|pdfUrl=https://ceur-ws.org/Vol-1959/paper-02.pdf
|volume=Vol-1959
|authors=Tommaso Di Noia
|dblpUrl=https://dblp.org/rec/conf/kdweb/Noia17
}}
==Knowledge-enabled Recommender Systems: Models, Challenges, Solutions==
<pdf width="1500px">https://ceur-ws.org/Vol-1959/paper-02.pdf</pdf>
<pre>
      Knowledge-enabled Recommender Systems:
            Models, Challenges, Solutions
               (Extended Abstract)?

                                  Tommaso Di Noia

                       SisInf Lab, Polytechnic University of Bari
                              tommaso.dinoia@poliba.it


Introduction

Together with the rapid growing of information we daily produce, we have as-
sisted to the flourishing of new tools and techniques whose aim is to help users
in accessing such information in a personalized way. In the Information Over-
load era we are living in, the amount of information exceeds the users capability
of processing and using it [39]. Huge and fast growing number of possibilities
overwhelms users, leading them to make poor decisions and feel anxiety and un-
satisfaction [37]. Recommender Systems (RSs) [32] are a family of information
filtering tools which have proven to be valuable means in assisting users to find,
in a personalized manner, what is relevant for them in such overflowing com-
plex information spaces. They provide users with personalized access to large
collections of resources.
     The main task of a recommendation engine is typically to estimate the rel-
evance of unknown items for a target user and recommend the Top-N items by
considering for each user the best N items with highest utility [32]. Typically,
the utility of an item is represented by a numerical rating, which indicates how
much a particular user liked such item. However, the utility is not defined for
each pair of users and items and usually it is available only for a small subset of
them. Such subset represents the input of a generic recommender system, whose
objective is to estimate the utility of all the remaining pairs. A formal definition
of the recommendation problem has been given in [1] and defined as follows.

Definition 1. Let U represent the set of users and I the set of items in the
system. Potentially, both sets can be very large. Let f : U ⇥ I ! R, where R is
a totally ordered set, be a utility function measuring the relevance of item i 2 I
for user u 2 U . Then, the recommendation problem consists in finding for each
user u such item imax,u 2 I maximizing the utility function f . More formally,
this corresponds to the following:

                         8u 2 U, imax,u = arg maxi2I f (u, i)
?
    This is an extended abstract of the keynote lecture given at the Third International
    Workshop on Knowledge Discovery on the Web (KDWeb 2017)
Depending on the the way the utility function is estimated and the availability of
additional data about the characteristics of items for example, there are di↵erent
types of recommendation techniques. The main two are: collaborative filtering
and content-based. A complete list of collaborative filtering and content-based
techniques together with their hybrid versions is given in [5] and in [32].

Collaborative Filtering Recommendation. Collaborative Filtering is the process
of filtering or evaluating items using the opinions of other people [36]. According
to [4] there are two main types of collaborative filtering methods: memory-based
and model-based. Memory-based CF uses a particular type of Machine Learn-
ing methods: nearest neighborhood (k-NN) algorithm. In particular, it does not
require any preliminary model building phase, since predictions are made by
aggregating the ratings of the closest neighbours. Conversely, model-based tech-
niques first learn a predictive model which is eventually used to make predic-
tions. Memory-based approaches can be classified either in user-based [15] or
item-based [35].

Content-based Recommendation. Content-based RSs recommend an item to a
user based upon a description of the item and a profile of the user’s interests [30].
Briefly, the basic process performed by a content-based recommender consists
is matching up the attributes of a user profile with the attributes of a content
object (item) [8]. Di↵erently from collaborative filtering, such recommendation
approach relies on the availability of content features describing the items. A
high level architecture of a content-based RS is presented in [8]. There are two
main content-based recommendation approaches: heuristic-based or model-based.
Approaches using heuristic functions have their roots in Information Retrieval
and Information Filtering. Items are recommended based on a comparison be-
tween their content and a user profile. The idea is to represent both items and
users using typical IR techniques [3], e.g. vectors of terms, and compute a match
between their representations. The user profile consists in a vector of terms built
from the analysis of the items liked by the user. Model-based approaches [29] use
Machine Learning techniques to learn a model of the user’s preferences by ana-
lyzing the content characteristics of items the user rated. Content-based methods
can have several limitations. For a complete and detailed description of content-
based recommendation techniques refer to [8,30]. The main one is the content
overspecialization which consists in the incapability of the system to recommend
relevant items which are di↵erent to the ones the user already knows.


Knowledge-enabled Recommender Systems.

In content-based approaches, the information exploited by recommendation en-
gines is very often encoded by bag of words or, more recently, by word embed-
dings [20]. In all these cases, no explicit semantics is associated to the contextual
data. Nowadays Linked Open Data datasets represent a huge repository of dif-
ferent kinds of knowledge spanning from sedimentary-one such as encyclopedic,
linguistic, common-sense and so on, to real-time one such as data streams, events,
etc. If we consider encyclopedic datasets such as DBpedia [18] or Wikidata [41],
we have access to a huge amount of factual knowledge referring to a variety of
topics. In order to e↵ectively incorporate Linked Open Data in recommenda-
tion applications there are several aspects to consider. Ultimately, the goal is to
provide the system with background knowledge about the domain of interest in
the form of a knowledge graph. In a high level architecture of a component in
charge of retrieving portions of the LOD graph regarding the items in the system
which are used to form the knowledge graph there are two main modules: the
Item Linker and the Item Graph Analyzer [28].

Item Linker. The Item Linker addresses the task of linking the items in the
system with the corresponding resources in the LOD knowledge bases. The aim
of such component is bridging the gap between the items in the catalog and LOD.
We have hypothesized two main ways for performing the linking task: Direct
Item Linking and Item Description Linking. These modules take as input
any dataset in the Linked Open Data cloud and the list of items in the catalog
with associated side information, if available, and return either the mapping
between items and URIs or the set of URIs found in each item description,
depending on the selected module.

Direct Item Linking This approach is the more straightforward way for accessing
LOD datasets. However, it requires that items have to be Linked Open Data
resources, otherwise it cannot be used. Using movie title and year information
it is possible to find the relative DBpedia resource. However, it is important to
solve possible cases of ambiguity. The simplest solution is to exploit the class
of the ontology that the item belongs to. For instance, in the movie domain we
may select the resources with class dbo:Film.

Item Description Linking This approach bases on the exploitation of side infor-
mation about the items such as textual descriptions or attributes. Such infor-
mation can be used as input for entity linking tools in order to have access to
LOD resources and link them to the item. Specifically, Entity Linking is the task
of linking the entity mentioned in the text with the corresponding real world
entity in the existing knowledge base [38]. Many Entity Linking tools have been
proposed in the literature and made available on the Web. Some of them are:
Babelfy [21], Dexter [7], DBpedia Spotlight [19], TAGME [14], NERD [33].

Item Graph Analyzer. This module is responsible of the extraction from the
knowledge base of a descriptive and informative subgraph for each item, that
is a set of RDF triples somehow related to the item resource. Eventually, all the
extracted portions of the original graph can be merged to obtain a specific knowl-
edge graph representative of the domain of interest covered by the recommender.
It takes as input the list of items URI returned by the Item Linker and returns a
set of RDF triples for each item. Potentially, each item resource may be connected
to a big portion of the LOD graph. However, not all entities and relations may
be informative and descriptive of the item content. Several strategies to select
a relevant subset of RDF triples for each item may be considered and adopted
[22,31]. One strategy can be to manually define a set of properties or sequences
of properties by using some domain knowledge. SPARQL queries are a powerful
tool to pre-filter subgraphs relevant for the recommendation task.

Evaluating LOD-based RSs
There are many datasets available for the evaluation of recommender systems.
However, such datasets are not appropriate for evaluating LOD-based recommen-
dation algorithms because they do not contains links to URIs. In order to evaluate
LOD-based RSs, three datasets belonging to di↵erent domains (movies, music and
book) have been processed to compute a mapping between items (movies, artists,
books) and their corresponding DBpedia URIs. The mappings for the datasets
is available at https://github.com/sisinflab/LODrecsys-datasets.

Movielens. This dataset is based on the MovieLens 1M dataset (http://www.
grouplens.org/node/73) released by the GroupLens research group. The orig-
inal dataset contains 1,000,209 1-5 stars ratings given by 6,040 users to 3,883
movies. We found a valid mapping for 3,300 out of the all movies.

LibraryThing. Derived from the LibraryThing (http://www.librarything.
com) dataset (http://www.macle.nl/tud/LT/). This dataset is related to the
book domain and contains 7,112 users, 37,231 books and 626,000 ratings ranging
from 1 to 10. In this case we found a match for 11,694 books.

LastFM. Di↵erently from the previous ones this third dataset is based on im-
plicit feedback consisting of user-artist listening data. Its data come from re-
cent initiatives on information heterogeneity and fusion in recommender systems
(http://ir.ii.uam.es/hetrec2011/datasets.html) [6] and has been built
on top of the Last.fm music system (http://www.lastfm.com). The original
dataset contains 1,892 users, 17,632 artists and 92,834 relations between a user
and a listened artist together with their corresponding listening counts. For this
dataset we found a match for 11,180 out of all artists.


Open Challenges
Recommender systems can be considered as a killer application for the exploita-
tion of the huge knowledge encoded in LOD datasets freely available on the
Web. Although, many solutions have been proposed and implemented to de-
liver this new generation of knowledge enabled recommendation engines (see
[31,12,11,28,24,23,10,40,25,9,27,26,16,2,17,34,13]) some important issues remain
still open to a deeper investigation. Among them we cite the most important
ones: feature selection, distributed computing, cross-domain recommendation,
computing explanation, the role of formal reasoning in the recommendation pro-
cess.
feature selection The richness of Linked Open Data datasets may result in
    a pitfall for data-intensive tasks (as computing recommendations) as they
    potentially introduce noise in the data. Selecting the right features in LOD-
    enabled recommender systems results not just in getting the minimal mean-
    ingful subset of properties which are directly connected to an item but, given
    the graph-based nature of the underling data, the minimal meaningful set of
    semantic paths, of arbitrary length, which result representative of the item
    itself.
distributed computing Over the last years solutions to horizontally distribute
    graph-based data manipulation have been proposed also boosted by the in-
    creasing production of data coming from social networks. All the methods, al-
    gorithms and frameworks work quite well with multi-relational graphs where
    the number of possible relations are just a few compared to that of Linked
    Open Data. New approaches need to be proposed and developed to easily
    deal with all the semantics encoded in LOD datasets.
cross-domain recommendation The highly interconnected nature of datasets
    such as DBpedia or Wikidata represents an opportunity to develop cross-
    domain recommender systems. That is, systems able to recommend items
    in a knowledge domain which is not the same of the user profile. As an ex-
    ample, we may be able to recommend books given the user profile collects
    information on movies.
computing explanation Sometimes, receiving recommendations may result
    frustrating as we do not know the reason why the system suggested such
    items to us. Computing explanation for recommendations has been identified
    as a must have feature for the new generation of recommender systems. In
    this direction, all the knowledge available as Linked Open Data may surely
    play a key role.
formal reasoning LOD datasets are not just a mere collection of data repre-
    sented in a graph-based way. They usually refer to a rich ontology which
    in turns can be represented by means of expressive logical languages as De-
    scription Logics. The adoption of such languages enable the application of
    formal logical reasoning over the underling data. As of today, due to its
    high computational complexity, such reasoning has not been exploited to
    its full potential but it can surely add new value to real knowledge-enabled
    recommender systems.


References
 1. Gediminas Adomavicius and Alexander Tuzhilin. Toward the next generation of
    recommender systems: A survey of the state-of-the-art and possible extensions.
    IEEE Transactions on Knowledge Data Engineering, 17(6):734–749, 2005.
 2. Vito Walter Anelli, Tommaso Di Noia, Pasquale Lops, and Eugenio Di Sciascio.
    Feature factorization for top-n recommendation: From item rating to features rele-
    vance. In Proceedings of the 1st Workshop on Intelligent Recommender Systems by
    Knowledge Transfer & Learning co-located with ACM Conference on Recommender
    Systems (RecSys 2017), Como, Italy, August 27, 2017., pages 16–21, 2017.
 3. Marko Balabanović and Yoav Shoham. Fab: Content-based, collaborative recom-
    mendation. Commun. ACM, 40(3):66–72, March 1997.
 4. John S. Breese, David Heckerman, and Carl Kadie. Empirical analysis of predictive
    algorithms for collaborative filtering. In Proceedings of the Fourteenth Conference
    on Uncertainty in Artificial Intelligence, UAI’98, pages 43–52, 1998.
 5. Robin D. Burke. Hybrid web recommender systems. In The Adaptive Web, Methods
    and Strategies of Web Personalization, pages 377–408, 2007.
 6. Iván Cantador, Peter Brusilovsky, and Tsvi Kuflik. 2nd workshop on information
    heterogeneity and fusion in recommender systems (hetrec 2011). In Proceedings of
    the 5th ACM conference on Recommender systems, RecSys 2011, New York, NY,
    USA, 2011. ACM.
 7. Diego Ceccarelli, Claudio Lucchese, Salvatore Orlando, Ra↵aele Perego, and Sal-
    vatore Trani. Dexter: An open source framework for entity linking. In Proceedings
    of the Sixth International Workshop on Exploiting Semantic Annotations in Infor-
    mation Retrieval, ESAIR ’13, pages 17–20, New York, NY, USA, 2013. ACM.
 8. Marco de Gemmis, Pasquale Lops, Cataldo Musto, Fedelucio Narducci, and Gio-
    vanni Semeraro. Semantics-Aware Content-Based Recommender Systems, pages
    119–159. Springer US, Boston, MA, 2015.
 9. Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, and Davide Romito.
    Exploiting the web of data in model-based recommender systems. In Proceedings
    of the Sixth ACM Conference on Recommender Systems, RecSys ’12, pages 253–
    256, New York, NY, USA, 2012. ACM.
10. Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, and
    Markus Zanker. Linked open data to support content-based recommender sys-
    tems. In Proceedings of the 8th International Conference on Semantic Systems,
    I-SEMANTICS ’12, pages 1–8, New York, NY, USA, 2012. ACM.
11. Tommaso Di Noia, Vito Claudio Ostuni, Paolo Tomeo, and Eugenio Di Sciascio.
    Sprank: Semantic path-based ranking for top-n recommendations using linked open
    data. ACM Transactions on Intelligent Systems and Technology (TIST), 8(1):9:1–
    9:34, 2016.
12. Tommaso Di Noia, Jessica Rosati, Paolo Tomeo, and Eugenio Di Sciascio. Adaptive
    multi-attribute diversity for recommender systems. Information Sciences, pages –,
    2016.
13. Ignacio Fernández-Tobı́as, Paolo Tomeo, Iván Cantador, Tommaso Di Noia, and
    Eugenio Di Sciascio. Accuracy and diversity in cross-domain recommendations
    for cold-start users with positive-only feedback. In Proceedings of the 10th ACM
    Conference on Recommender Systems, Boston, MA, USA, September 15-19, 2016,
    pages 119–122, 2016.
14. Paolo Ferragina and Ugo Scaiella. Tagme: On-the-fly annotation of short text
    fragments (by wikipedia entities). In Proceedings of the 19th ACM International
    Conference on Information and Knowledge Management, CIKM ’10, pages 1625–
    1628, New York, NY, USA, 2010. ACM.
15. Rong Jin, Joyce Y. Chai, and Luo Si. An automatic weighting scheme for col-
    laborative filtering. In Proceedings of the 27th Annual International ACM SIGIR
    Conference on Research and Development in Information Retrieval, SIGIR ’04,
    pages 337–344, New York, NY, USA, 2004. ACM.
16. Aleksandra Karpus, Tommaso Di Noia, and Krzysztof Goczyla. Top k recommen-
    dations using contextual conditional preferences model. In Proceedings of the 2017
    Federated Conference on Computer Science and Information Systems, FedCSIS
    2017, Prague, Czech Republic, September 3-6, 2017., pages 19–28, 2017.
17. Aleksandra Karpus, Tommaso Di Noia, Paolo Tomeo, and Krzysztof Goczyla. Rat-
    ing prediction with contextual conditional preferences. In Proceedings of the 8th
    International Joint Conference on Knowledge Discovery, Knowledge Engineering
    and Knowledge Management (IC3K 2016) - Volume 1: KDIR, Porto - Portugal,
    November 9 - 11, 2016., pages 419–424, 2016.
18. Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas,
    Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören
    Auer, and Christian Bizer. DBpedia - a large-scale, multilingual knowledge base
    extracted from wikipedia. Semantic Web Journal, 6(2):167–195, 2015.
19. Pablo N. Mendes, Max Jakob, Andrés Garcı́a-Silva, and Christian Bizer. Dbpedia
    spotlight: Shedding light on the web of documents. In Proceedings of the 7th
    International Conference on Semantic Systems, I-Semantics ’11, pages 1–8, New
    York, NY, USA, 2011. ACM.
20. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Je↵rey Dean. Dis-
    tributed representations of words and phrases and their compositionality. In Pro-
    ceedings of the 26th International Conference on Neural Information Processing
    Systems - Volume 2, NIPS’13, pages 3111–3119, USA, 2013. Curran Associates
    Inc.
21. Andrea Moro, Alessandro Raganato, and Roberto Navigli. Entity Linking meets
    Word Sense Disambiguation : a Unified Approach. Transactions of the Association
    for Computational Linguistics (TACL), 2014.
22. Cataldo Musto, Pasquale Lops, Pierpaolo Basile, Marco de Gemmis, and Giovanni
    Semeraro. Semantics-aware graph-based recommender systems exploiting linked
    open data. In Proceedings of the 2016 Conference on User Modeling Adaptation
    and Personalization, UMAP ’16, pages 229–237, New York, NY, USA, 2016. ACM.
23. Phuong Nguyen, Paolo Tomeo, Tommaso Di Noia, and Eugenio Di Sciascio. An
    evaluation of simrank and personalized pagerank to build a recommender system
    for the web of data. In Proceedings of the 24th International Conference on World
    Wide Web, WWW ’15 Companion, pages 1477–1482, New York, NY, USA, 2015.
    ACM.
24. Phuong T. Nguyen, Paolo Tomeo, Tommaso Di Noia, and Eugenio Di Sciascio.
    Content-Based Recommendations via DBpedia and Freebase: A Case Study in the
    Music Domain, pages 605–621. Springer International Publishing, Cham, 2015.
25. Vito Claudio Ostuni, Tommaso Di Noia, Eugenio Di Sciascio, and Roberto Mirizzi.
    Top-n recommendations from implicit feedback leveraging linked open data. In
    Proceedings of the 7th ACM Conference on Recommender Systems, RecSys ’13,
    pages 85–92, New York, NY, USA, 2013. ACM.
26. Vito Claudio Ostuni, Tommaso Di Noia, Roberto Mirizzi, and Eugenio Di Sciascio.
    A linked data recommender system using a neighborhood-based graph kernel. In
    The 15th International Conference on Electronic Commerce and Web Technologies,
    Lecture Notes in Business Information Processing. Springer-Verlag, 2014.
27. Vito Claudio Ostuni, Giosia Gentile, Tommaso Di Noia, Roberto Mirizzi, Davide
    Romito, and Eugenio Di Sciascio. Mobile movie recommendations with linked data.
    In Human-Computer Interaction & Knowledge Discovery @ CD-ARES’13, IFIP
    International Cross Domain Conference and Workshop on Availability, Reliability
    and Security, CD-ARES 2013. Springer, 2013.
28. Vito Claudio Ostuni, Sergio Oramas, Tommaso Di Noia, Xavier Serra, and Eugenio
    Di Sciascio. Sound and music recommendation with knowledge graphs. ACM
    Transactions on Intelligent Systems and Technology (TIST), 8(2):21:1–21:21, 2016.
29. Michael Pazzani and Daniel Billsus. Learning and revising user profiles: The iden-
    tification ofinteresting web sites. Mach. Learn., 27(3):313–331, June 1997.
30. Michael J. Pazzani and Daniel Billsus. The adaptive web. chapter Content-based
    Recommendation Systems, pages 325–341. Springer-Verlag, Berlin, Heidelberg,
    2007.
31. Azzurra Ragone, Paolo Tomeo, Corrado Magarelli, Tommaso Di Noia, Matteo
    Palmonari, Andrea Maurino, and Eugenio Di Sciascio. Schema-summarization
    in linked-data-based feature selection for recommender systems. In 32nd ACM
    SIGAPP Symposium On Applied Computing. ACM, 2017.
32. Francesco Ricci, Lior Rokach, and Bracha Shapira, editors. Recommender Systems
    Handbook. Springer, 2015.
33. Giuseppe Rizzo and Raphaël Troncy. Nerd: A framework for unifying named entity
    recognition and disambiguation extraction tools. In Proceedings of the Demon-
    strations at the 13th Conference of the European Chapter of the Association for
    Computational Linguistics, EACL ’12, pages 73–76, Stroudsburg, PA, USA, 2012.
    Association for Computational Linguistics.
34. Jessica Rosati, Petar Ristoski, Tommaso Di Noia, Renato De Leone, and Heiko
    Paulheim. RDF graph embeddings for content-based recommender systems. In
    Proceedings of the 3rd Workshop on New Trends in Content-Based Recommender
    Systems co-located with ACM Conference on Recommender Systems (RecSys 2016),
    Boston, MA, USA, September 16, 2016., pages 23–30, 2016.
35. Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Item-based
    collaborative filtering recommendation algorithms. In Proceedings of the 10th In-
    ternational Conference on World Wide Web, WWW ’01, pages 285–295, 2001.
36. J. Ben Schafer, Dan Frankowski, Jon Herlocker, and Shilad Sen. The adaptive web.
    chapter Collaborative Filtering Recommender Systems, pages 291–324. Springer-
    Verlag, Berlin, Heidelberg, 2007.
37. Barry Schwartz. The Paradox of Choice: Why More Is Less. Harper Perennial,
    January 2005.
38. Wei Shen, Jianyong Wang, Ping Luo, and Min Wang. Linden: Linking named
    entities with knowledge base via semantic knowledge. In Proceedings of the 21st
    International Conference on World Wide Web, WWW ’12, pages 449–458, New
    York, NY, USA, 2012. ACM.
39. Cheri Speier, Joseph S. Valacich, and Iris Vessey. The Influence of Task Interrup-
    tion on Individual Decision Making: An Information Overload Perspective. Deci-
    sion Sciences, 30(2):337–360, March 1999.
40. Paolo Tomeo, Ignacio Fernández-Tobı́as, Tommaso Di Noia, and Iván Cantador.
    Exploiting linked open data in cold-start recommendations with positive-only feed-
    back. In Proceedings of the 4th Spanish Conference on Information Retrieval, CERI
    ’16, pages 11:1–11:8, New York, NY, USA, 2016. ACM.
41. Denny Vrandečić. Wikidata: A new platform for collaborative data collection. In
    Proceedings of the 21st International Conference on World Wide Web, WWW ’12
    Companion, pages 1063–1064, New York, NY, USA, 2012. ACM.

</pre>