=Paper=
{{Paper
|id=Vol-1448/paper5
|storemode=property
|title=Generic knowledge-based Analysis of Social Media for Recommendations
|pdfUrl=https://ceur-ws.org/Vol-1448/paper5.pdf
|volume=Vol-1448
|dblpUrl=https://dblp.org/rec/conf/recsys/GraaffVKB15
}}
==Generic knowledge-based Analysis of Social Media for Recommendations==
<pdf width="1500px">https://ceur-ws.org/Vol-1448/paper5.pdf</pdf>
<pre>
                            Generic knowledge-based analysis of
                             social media for recommendations

                                          Victor de Graaff                 Anne van de Venis
                                   Dept. of Computer Science            Dept. of Computer Science
                                      University of Twente                 University of Twente
                                   Enschede, The Netherlands            Enschede, The Netherlands
                                          v.degraaff@utwente.nl         a.j.vandevenis@student.utwente.nl

                                      Maurice van Keulen                       Rolf A. de By
                                   Dept. of Computer Science           Fac. of Geo-Information Science
                                      University of Twente               & Earth Observation (ITC)
                                   Enschede, The Netherlands                University of Twente
                                        m.vankeulen@utwente.nl          Enschede, The Netherlands
                                                                              r.a.deby@utwente.nl


ABSTRACT                                                                alone currently have a total of 5.87 billion facebook-likes [2].
Recommender systems have been around for decades to help                The items that people express a preference for on social me-
people find the best matching item in a pre-defined item                dia, whether through a like of a Facebook page, a follow on
set. Knowledge-based recommender systems are used to                    Twitter, or a tip on the renewed FourSquare, can be taken to
match users based on information that links the two, but                disclose personal traits of interest and the things they want
they often focus on a single, specific application, such as             to be associated with. This vast amount of information is
movies to watch or music to listen to. In this paper, we                the starting point for our Interest-Based Recommender Sys-
present our Interest-Based Recommender System (IBRS).                   tem (IBRS).
This knowledge-based recommender system provides rec-
ommendations that are generic in three dimensions: IBRS                 But what people express their preference for on social media,
is (1) domain-independent, (2) language-independent, and                cannot always directly be related to commonly used tags or
(3) independent of the used social medium. To match user                words in descriptions in an existing item set. These items
interests with items, the first are derived from the user’s             are often example instances of broader concepts. For exam-
social media profile, enriched with a deeper semantic em-               ple: Cristiano Ronaldo has 103 million facebook-likes at the
bedding obtained from the generic knowledge base DBpe-                  time of writing, whereas Soccer (66 million) and Football
dia. These interests are used to extract personalized rec-              (46 million) have considerably fewer facebook-likes.1 Tag
ommendations from a tagged item set from any domain, in                 sets or descriptions, on the other hand, are more likely to
any language. We also present the results of a validation of            contain these broader concepts, as for example is the case
IBRS by a test user group of 44 people using two item sets              in greeting cards, sports equipment, or campsites with soc-
from separate domains: greeting cards and holiday homes.                cer fields. In fact, one of our validation item sets contains
                                                                        tagged greeting cards with practically only generic terms
Keywords                                                                such as soccer/football. To bridge this generalization gap in
Recommender systems, knowledge-based, DBpedia, social                   a domain- and language-independent way, we use the mul-
media, domain-independent, language-independent                         tilingual, generic knowledge base DBpedia to automatically
                                                                        detect broader concepts. We call these concepts the user’s
                                                                        interests. In this paper, we validate our hypothesis that au-
General Terms                                                           tomated user interest detection can also be used to select
Algorithms, Design, Experimentation                                     preferred items in an item set, independent of the item set
                                                                        domain, language and used social medium. As a boundary
Categories and Subject Descriptors                                      requirement to our solution, the cold-start problem, as for
H.4.2 [Information Systems Applications]: Types of                      example discussed by Bobadilla et al. [3], needs to be circum-
Systems—Decision support                                                vented. The system we propose shall be seen as a feature
                                                                        of a larger recommender system, either to bootstrap or to
1.    INTRODUCTION                                                      support that system, rather than as a stand-alone system.
The aim of a recommender system (RS) is to help people
                                                                        In addition to the recommendation approach we propose in
find the items they are most interested in. A requirement
                                                                        this paper, we also present the results of a validation thereof.
to provide personalized recommendations is that the RS has
                                                                        A user group of 44 people tested our RS, using item sets
knowledge of the person using it. In 2013, Facebook claimed
                                                                        from two completely different domains: greeting cards and
to have 1.11 billion active users [1], and the top-100 pages
CBRecSys 2015, September 20, 2015, Vienna, Austria.                     1
                                                                          Synonyms like this one cause problems as well, and are
Copyright remains with the authors and/or original copyright holders    discussed in more detail in Section 3
holiday homes. Both the recommendation selection, as well        [18, 19].
as the explanation interface were validated by these users,
using their own social media profile.                            Our work is inspired by Shi et al.’s HeteRecom [20], which is
                                                                 based on the similarity calculation HeteSim [21]. Similar to
This paper is further structured as follows: related work is     their work, our ultimate goal is to find the matching paths
discussed in Section 2, the motivation behind this research is   between a user and the item set that carry the most weight.
discussed in Section 3, the IBRS technology is presented in      In this paper however, we focus on the detection of existing
Section 4, while the validation approach and results are laid    paths.
out in Section 5, and Section 6 finally contains concluding
remarks and hints at future work.                                3.   MOTIVATION
                                                                 In this work, we aim to extract recommendations that are
2.   RELATED WORK                                                generic in three dimensions: the recommendation approach
The creation of a RS that makes use of social media or DB-       shall be independent of the item set domain, the item set lan-
pedia is not a new ambition. Social media have especially        guage, and the used social medium. As a fourth criterium,
received much attention in the field of content-based recom-     it shall not suffer from any of Bobadilla’s three cold-start
mender systems. Fijalkowski and Zatoka presented an archi-       problem categories. Below, we discuss the motivation for all
tecture of a recommender system for e-commerce based on          of these challenges:
Facebook profiles [4]. Guy et al. proposed five recommender
types, based on social media and/or tags [5]. In their ap-       Domain-independence
proach, they also presented the users with recommendation        As discussed in the previous section, currently most recom-
explanation. The social media they focus on however, are         mender systems based on knowledge bases and social media
not of the mainstream type, but specific for the Lotus Con-      are focused on one specific domain. Independence of the
nections suite. The system of He et al., on the other hand,      item set domain only allows us to reuse the solution and its
uses common social media [6]. Whereas they claim to over-        future improvements for multiple applications.
come the cold-start problem, their system appears to still
suffer from the new item cold-start problem, as described        Language-independence
by Bobadilla et al. [3].                                         Similar to domain-independence as a requirement for reusabil-
                                                                 ity, a language-independent solution improves the RS’s po-
The creation of a RS based on DBpedia has also received          tential to be used in multiple applications. A sub-requirement
quite some attention already, especially in the field of mu-     of of language-independence is synonym-independence. As
sic [7, 8] and movie [9, 10, 11, 12, 13] recommendation. Di      Zanardi and Capra pointed out in [22], synonyms are a typ-
Noia et al. took it a step further and also benefited from the   ical RS problem, especially for tag-based RSs. The example
integration of DBpedia in the linked open data (LOD) initia-     of people facebook-liking either the Soccer page or the Foot-
tive. Their movie recommendations are not only based on          ball page from Section 1 already showed that people may
DBpedia knowledge, but also on Freebase and LinkedMDB.           facebook-like different pages, while referring to the same
A more generic approach to create a RS using LOD was done        concept. Despite recent efforts by Facebook to merge pages
by Heitmann and Hayes [14], who use also use LOD to over-        about the same topic from different languages into one page,
come the cold-start problem. Even though their validation        and improving the search functionality to help people find-
is based on a music dataset, their approach has the generic-     ing such pages while searching for their name in a different
ity to be used for other applications as well. Our approach      language, still several pages exist to describe similar con-
for broader concept detection through DBpedia is a form          cepts.
of knowledge-based query expansion. Liang et al. already
showed in [15] that document recommendation based on the
user’s interests improves as a result of query expansion, or     Social medium-independence
semantic-expansion as they call it.                              From the first form of genericity, domain-independence, fol-
                                                                 lows another requirement. Several social media, such as
What distinguishes our approach from other RS research,          Facebook, LinkedIn, Twitter, Instagram, and Pinterest, are
is that we use both social media profiles and DBpedia data       widely used, and each of these has its own focus. When one
to create a generic RS. Passant and Raimond, for exam-           decides to create a RS for job vacancies, LinkedIn may be a
ple, created a RS based on exported social media profiles        more logical social medium to base the recommendations on
and DBpedia data in [8], but their approach is limited to        than any of the other, while a RS for touristic hotspots will
the music-specific relations in DBpedia. To the best of our      most likely lead to another choice. Therefore, to create a RS
knowledge, the only other generic approach is TasteWeights       based on social media content that is domain-independent,
by Bostandjiev et al. [16]. They build a user profile based on   it shall also be independent of the underlying social medium.
social media data, and then apply a collaborative filtering-
based approach to select recommendations. This still implies
all of the three cold-start problem categories: new item, new    Cold-start problem
user, and new community, again as described by Bobadilla et      The cold-start problem has been widely discussed in RS
al. [3]. As it is exactly our goal to overcome the cold-start    literature. Bobadilla et al. categorized it into three sub-
problem, our approach is a hybrid between content-based          categories: the new item problem, the new user problem,
and knowledge-based, according to the RS classification by       and the new community problem [3]. Knowledge-based RS
Burke and Ramezani [17]. Basile, Lops et al. would classify      have been designed to overcome all of these problems, but
our work as a top-down semantics-aware content-based RS          often require domain-specific knowledge.
                      Colosseum          Colosseum        Vespasian           Rome               Rome


                        Pizza               Pizza          Calzone            Italy               Italy


                      Francesco           Francesco          A.S.            Stadio              Stadio
                        Totti               Totti           Roma            Olimpico            Olimpico


Figure 1: The IBRS concept, illustrated using the holiday home domain. A user’s preferred items on social media are mapped
onto knowledge base resources. Broader concepts are detected by exploring the knowledge base graph, and finally mapped
onto tags in the item set database.


Overcoming all of these four challenges at the same time              its second neighbor C.2 In Table 1, we show the top-10 of
has motivated us to create IBRS: a domain-independent,                second neighbors when traversing the DBpedia graph start-
language-independent, social medium-independent, knowl-               ing from the Eiffel Tower as node A, using all four possible
edge-based RS.                                                        direction combinations. DBpedia pages in italics also oc-
                                                                      cur as tags in at least one of our two validation sets, which
4.    CONCEPT & TECHNOLOGY                                            are discussed in detail in Section 5. The first approach,
                                                                      A → B → C, leads to results describing France, influen-
The foundation of IBRS is the idea that people are more
                                                                      tial French people, and several other buildings in France.
likely to be interested in items that have a not too distant
                                                                      The second approach, A ← B → C, has some overlap with
relation with things we know they like. Although things
                                                                      the first approach, but also contains several results unre-
people express a preference for on social media are typically
                                                                      lated to France, such as Los Angeles and the United States.
in a different domain than our item set, they may still give
                                                                      The third approach, A ← B ← C, shows some remarkable
hints towards a person’s interests. In IBRS, we link the
                                                                      buildings throughout Europe, but also very unrelated lists
preferred items on social media to resources in the DBpedia
                                                                      towards the bottom of the top-10. The fourth and final ap-
Resource Description Framework (RDF) graph. We use this
                                                                      proach, A → B ← C, results in several famous French peo-
graph to explore related concepts, which are then matched
                                                                      ple, especially scientists. Other starting points show similar
with a known tag set, that is used to label the item set. As
                                                                      results: the third approach, A ← B ← C, shows promising
a final step, we rank the item set based on the number of
                                                                      results for single domain recommendations, whereas the first
matched tags. This concept is illustrated, using the holi-
                                                                      approach shows the best results for broader concept detec-
day home domain, in Figure 1. In this example, the user
                                                                      tion. Since our aim is to match these second neighbors with
facebook-liked the Colosseum, pizza, and Francesco Totti.
                                                                      a tag set, we use the first approach, A → B → C.
These facebook-likes are mapped onto DBpedia, and the
DBpedia RDF graph is explored to detect the broader con-
cepts Rome, Italy, and Stadio Olimpico. These items are               4.2   Abstraction layer data model
mapped onto holiday home tags, to ultimately match the                To ensure IBRS genericity, an abstraction layer is used on
user with a specific holiday home.                                    top of the underlying data source, such as a product database.
                                                                      This abstraction layer can consist of physical tables, views,
The remainder of this section is structured as follows: RDF           or a mix thereof, but we will refer to its items as tables from
graph exploration is discussed in Section 4.1. The data               here on. The abstraction layer contains two entity tables:
model of the IBRS abstraction layer is presented in Sec-              abstract items and tags, and one relationship table: ab-
tion 4.2. Section 4.3 presents a method for automated tag             stract items tags, as depicted in Figure 2.
generation from descriptions. In Section 4.4 the ranking
mechanism and Facebook-DBpedia mapping approach are
presented. Section 4.5, finally, presents a short introduction
of the IBRS prototype.

4.1   DBpedia graph exploration                                                 Figure 2: Abstraction layer data model
After matching a facebook-like with a DBpedia resource, we
traverse the RDF graph in exactly two steps. Since RDF                The abstract items table contains the id and object type
tuples have a subject, predicate and object, RDF graphs are           2
                                                                       Depending on the directions of the relationships, and the
directed. Therefore, there are four possible different direc-         existence of bi-directional relationships, node A may be
tion combinations to travel from node A through node B to             equal to node C, as can also be seen in Table 1.
  Rank       A → B → C (#)             A ← B → C (#)                  A ← B ← C (#)                   A → B ← C (#)
    1            Paris (20)            Eiffel Tower (41)               Eiffel Tower (7)              Paul Langevin (51)
    2           France (20)               France (17)               Palácio de Ferro (3)            Léon Foucault (48)
    3         Eiffel Tower (7)             Paris (15)             Cologne Cathedral (2)             Jean Témerson (48)
    4        Manuel Valls (6)           Los Angeles (4)          Eiffel Bridge, Ungheni (2)          Frédéric Passy (45)
    5      François Hollande (6)     British Library (4)          Souleuvre Viaduct (2)         L.A. de Bougainville* (45)
    6        Unitary state (6)          Bonnétable (4)              Samuel Hibben (2)            Cecile de Brunhoff (45)
    7       French language (6)      Aarhus University (4)           Casa de Fierro (2)          Adrien-Marie Legendre (45)
    8        Anne Hidalgo (6)         Garabit viaduct (4)       Modern Marvels episodes* (2)         Robert Perrier (45)
    9         Bonnétable (4)       St Paul’s Cathedral (4)     Monopoly editions USA* (2)        Paul Lévy (math.)* (45)
   10       Garabit viaduct (4)        United States (4)            Garabit viaduct (2)               Émile Drain (45)

Table 1: Top-10 of second neighbor nodes C through DBpedia graph exploration in multiple directions for the Eiffel Tower
resource as node A. Numbers between brackets indicate number of paths between that node and the Eiffel Tower node. Items
in italics also occur as tags in at least one of our two validation tag sets. Items marked with an asterisk are abbreviated.


of the items in the item set. The object type field allows         APIs are the same. Some social medium APIs allow devel-
us to use one IBRS instance for the recommendation of mul-         opers to find out what a user’s friends prefer, while others
tiple item sets.                                                   limit the developer to information about the logged in user.
                                                                   Therefore, when using the Facebook Graph API, we lim-
The tags table contains the tag’s id, name, and dbpe-              ited ourselves to the name and category elements of each
dia resource id. The name field can be used in the lan-            facebook-liked page.
guage of the item set tags. Since we have one item set that
is tagged in Dutch, and one item set that is tagged in En-         Matching social media items with DBpedia resources
glish, we added the name eng field for English tags. The           Facebook-likes are mapped onto DBpedia resources through
dbpedia resource id is cached in the database for better           their name. Those facebook-pages that mapped onto am-
performance.                                                       biguous terms in DBpedia were filtered out. To create a
                                                                   more complete mapping, we used the category element to
The abstract items tags table is a regular relation table          postfix the name of those pages pages for which the cat-
containing the abstract item id and tag id. It also con-           egory element was filled with “movie,” “tv show,” or “mu-
tains the abstract item type for improved join executions.         sician/band.” In these cases, we also checked if a page ex-
                                                                   ists with the additional suffix “ (movie),” “ (TV series),” or
4.3    Tag generation                                              “ (band)” respectively. This leads to the following SPARQL
In case an item set is not tagged, but does contain descrip-       query:
tive texts, tags can be extracted automatically. Natural lan-
guage processing algorithms can be used for this purpose,
such as the named entity extraction and disambiguation ap-         PREFIX dbpont: <http://dbpedia.org/ontology/>
                                                                   PREFIX dbpres: <http://dbpedia.org/resource/>
proach by Habib et al. [23]. We used Habib’s approach with         # We use the prefixed versions here for readability
a manually trained model to extract named entities from
holiday home descriptions. A drawback of this approach             SELECT ?uri ?label
is that descriptions are often the result of free-text input.      WHERE {
Phrases such as “only a 3 hour flight from Amsterdam” or            # Find exact match with category suffix
“25 kilometers from the border with France” led to correctly        { ?uri dbpont:wikiPageID [].
                                                                        FILTER(?uri = dbpres:The_Net_(movie)) }
extracted named entities, but semantically not the best tags
to distinguish this object from others. Therefore, we addi-         # Or exact match without category suffix
tionally removed those tags that tagged a holiday home with         UNION { ?uri dbpont:wikiPageID [].
another country than the one it is located in. In total, this           FILTER(?uri = dbpres:The_Net) }
approach allowed us to assign 455,777 (non-unique) tags to
42,148 holiday homes, from which 106,430 tags (of which              # Or the label version
                                                                    UNION {?uri rdfs:label "The_Net"@en.}
12,151 unique) could be mapped onto a DBpedia resource.
                                                                    # Check if page has redirect
4.4    Ranking                                                      UNION { dbpres:The_Net_(movie)
The IBRS ranking method consists of four steps: (1) retriev-            dbpont:wikiPageRedirects ?uri}
                                                                    UNION { dbpres:The_Net
ing preferred items from social media, (2) matching these               dbpont:wikiPageRedirects ?uri}
items with DBpedia resources, (3) extracting abstracts from
DBpedia, (4) ranking items based on matched tags. For per-          ?uri rdfs:label ?label.
formance reasons, several items are cached offline.                 ?uri dbpont:wikiPageID ?wikiPageid.
                                                                    FILTER (langMatches(lang(?label),"en")).
Obtaining preferred items from social media                         # Filter out ambiguous terms
To map social media items while remaining independent of            FILTER NOT EXISTS { ?uri
the social medium, we must take into account that not all             dbpont:wikiPageDisambiguates ?disambiguates } .
                                                                  still consider the concept itself domain-independent. This
 # Filter out Wikipedia categories                                in contrast to for example music recommenders that rely on
 MINUS {?uri rdf:type skos:Concept}                               the artist-song relationship.
}
LIMIT 1
                                                                  4.5   Prototype
                                                                  For demonstration and validation purposes, we have created
Using this approach on a test set of 11,674 unique Facebook       a prototype of IBRS, using the Cake PHP platform. The
pages, obtained from the likes of 309 users, we were able         prototype can be used with either one’s own Facebook pro-
to match 2,240 (19.2%) Facebook-pages with a DBpedia re-          file, or by manually combining several DBpedia resources.
source.                                                           It can be accessed through http://ibrs.ewi.utwente.nl.

Extracting abstracts from DBpedia                                 5.    VALIDATION
For all matched DBpedia resources, the abstracts are re-          To validate our ranking mechanism, as well as to deter-
trieved from the SPARQL endpoint provided by DBpedia              mine the user perception of recommendations with explana-
[24] using the following query:                                   tions, we validated IBRS in a carefully designed user study
                                                                  with a test user group of 44 people. We used two prod-
                                                                  uct sets from different domains to demonstrate its domain-
PREFIX dbpont: <http://dbpedia.org/ontology/>                     independence: greeting cards and holiday homes. The greet-
PREFIX dbpres: <http://dbpedia.org/resource/>
                                                                  ing card set contains Dutch tags, while the holiday homes
SELECT DISTINCT                                                   did not contain any tags, but only descriptions. From the
 ?o3 (count(?o3) as ?count) ?abstract ?label                      holiday homes, we used the English descriptions to extract
                                                                  (English) tags, to emphasize the potential to use IBRS in a
WHERE {                                                           language-independent way.
 # UNION concatenation of mapped FB pages
 {dbpres:Vienna ?p1 ?o2} UNION
                                                                  This section is further structured as follows: Section 5.1
 {dbpres:Recommender_system ?p1 ?o2} UNION
 {dbpres:Computer_science ?p1 ?o2}                                describes the item set details. In Section 5.2, we present
                                                                  the approach taken to validate both our ranking mechanism
 # Neighboring object has Wikipage                                and the recommendation explanation interface. Section 5.3
 ?o2 dbpont:wikiPageID ?o2id ;                                    finally, discusses the validation results.
 # Neighboring object has neighbor
    ?p2 ?o3 .                                                     5.1   Item set details
                                                                  The first item set contains greeting cards from the Dutch
 # Second neighbor object has Wikipage                            company Kaartje2Go (“Card2Go”). People search through
 ?o3 dbpont:wikiPageID ?o3id ;
    dbpont:abstract ?abstract ;
                                                                  a collection of cards electronically, which are distributed
    rdfs:label ?label .                                           through regular (non-electronic) mail by Kaartje2Go in name
                                                                  of the customer. The customers can choose between sending
 # English is used as an example                                  greeting cards to one or multiple people at once. 75% of the
 FILTER(langMatches(lang(?abstract), ’en’)) .                     purchases are of the latter type, for which the preferences of
 FILTER(langMatches(lang(?label), ’en’)) .                        the sender are more relevant than those of the (potentially
 # Second neighbor object must not be a category
                                                                  many) recipients. To facilitate the search, users can search
 MINUS {?o3 rdf:type skos:Concept}                                for tags that have been entered manually by the Kaartje2Go
}                                                                 employees. These tags, which are mostly in Dutch, are in-
                                                                  consistent in their completeness: for example some of the
# ‘Only’ the 1000 most important abstracts                        soccer cards are also tagged using the names of popular
ORDER BY DESC(?count)                                             Dutch soccer teams, but not all of them. Less popular teams
LIMIT 1000
                                                                  are never mentioned as tags. The top-10 of the translated
                                                                  greeting card tags can be found in Table 2.
Ranking items based on matched tags
Each tag that (1) has a dbpedia resource id and (2) is            The second item set contains holiday homes from the hol-
contained in at least one of the downloaded abstracts, is         iday home portal EuroCottage. This item set did not con-
marked as a matched tag. The item set is then ranked on the       tain tags, but a description in one, two or three languages
basis of the number of matched tags. As a final step, those       (Dutch, English and/or German). We followed the approach
items that are too close to a higher ranked item, based on a      discussed in Section 4.3 to extract mentions of geographic
pre-defined distance function, are removed from the ranking.      places from the English holiday home descriptions. The
This last step is added to ensure diversity among the recom-      top-10 of resulting tags can be found in Table 3. The advan-
mended items. For the recommendation of geographic ob-            tage of extracting geographic places is that these also often
jects, as for example in a geo-social RS like the one discussed   have Wikipedia pages, which makes them suitable for the re-
in [25], one can think of the Euclidean distance, but for         quirement that the tags need to have a dbpedia resource id.
more generic purposes the cosine similarity (as for example       Many pages of the holiday home descriptions were in Ger-
discussed in [22]) of the item’s tags may be a good starting      man, even though they were entered into the system by the
point. The tag input makes our RS domain-aware. However,          holiday home owners as English descriptions. As a result
since the approach can be applied to any tag domain, we           thereof, many German words or phrases were extracted as
                   Tag         Frequency                       to validate our ranking mechanism, the third batch was in-
                 Birthday         7,535                        tended to determine the user perception of recommendations
                  Party           4,200                        with explanations, as compared to recommendations with-
                   Love           2,521                        out explanations.
                   Girl           2,268
                   Boy            2,084                        For the first ten questions, users were asked to select their
                  Infant          2,056                        favorite greeting card from a greeting card pair using the
                Photograph        1,793                        interface of Figure 3. On one side of the screen, an item from
                 Marriage         1,543                        the top-10 greeting cards according to IBRS was shown. On
                   Cool           1,381                        the other side, a card was shown that was not tagged with
                 Animals          1,373                        any of the matched tags. We called these recommendations
                                                               Inverted IBRS. IBRS and Inverted IBRS were shown on the
Table 2: Top-10 of (translated) manual greeting card tags      left or right side at random.
with a DBpedia reference, ordered by the number of cards
with this tag


geographical references, since the model was trained for En-
glish descriptions. However, the impact of these terms was
practically zero, as these extracted tags were not matched
with an English DBpedia resource.3 For the validation, the
holiday homes were plotted on a map that was zoomed in
on Europe, since most holiday homes in the set are located     Figure 3: Validation interface for greeting card comparison
there. A relatively small subset of homes outside Europe
could therefore not be displayed on the map, and were re-      For the second batch of ten questions, our test users were
moved from the validation set, just as those without a coor-   presented with the choice between two holiday homes, in a
dinate pair. This coordinate pair was also used for the di-    similar way. Again, IBRS and Inverted IBRS were shown
versity function: all top-10 holiday homes had to be located   on the left or right side at random. For each holiday home,
at least 250 kilometers away from higher ranked items.         its location was shown on a map, with the name of the holi-
                                                               day home and the first 1000 characters of its description, as
                   Tag            Frequency                    shown in Figure 4.
                 Florence            760
                   Siena             656
             Mediterranean Sea       634
                 Tuscany             537
                 Legoland            513
                  Venice             508
                 Sotkamo             448
                  Europe             440
                Ardennes             421
                   Pisa              363

Table 3: Top-10 of extracted tags for holiday homes with a
DBpedia reference, ordered by the number of holiday homes
with this tag


5.2   Validation approach
Our test users were requested to participate through Face-
book, and used their own existing Facebook account for the
recommendations. The test users were not aware of what
they were testing, except for the information that they were
testing a RS. Most test users do not have a background in
computer science, and none of them were aware of how IBRS
works. We asked our test users to validate our algorithm
through a total of 30 questions, split up into three batches   Figure 4: Validation interface for holiday home comparison
of 10. Once a question had been answered, users could not
return to that question. The first two batches were intended
                                                               The final batch of ten questions required the test users to
3
 Even though the approach can be applied to any language       rate a recommendation. Each of the holiday homes was one
contained in the knowledge base, the tags are still matched    of the top-10 holiday homes according to IBRS. At random,
with knowledge base resources in the tag language.             a user was assigned to the group of users who received rec-
ommendations with an explanation, as shown in Figure 5,                                                       Inverted IBRS
                                                                                                                                                                          Inverted IBRS
                                                                                                              (31%)
or without an explanation.                                        IBRS
                                                                                                                                                                          (34%)
                                                                  (47%)
                                                                                                                              IBRS
                                                                                                                              (55%)

                                                                                                        Tie                                                          Tie
                                                                                                        (22%)                                                        (11%)

                                                                  (a) Split out between        (b) Overall (batches com-
                                                                  greeting cards and holiday   bined)
                                                                  homes (batches counted
                                                                  separately)
                                                                 Figure 6: Most frequent choices per user for the first two
                                                                 batches of questions


                                                                 the recommendations with an average score of 3.3772, while
                                                                 users without recommendation explanation rated the recom-
                                                                 mendations with a 3.4709 on average. From this validation,
                                                                 we can conclude that people that receive recommendations
                                                                 based on tags that do not describe them well, are more likely
                                                                 to reject a recommendation with a “strongly disagree,” when
                                                                 they see the rationale behind the recommendation.

                                                                 Despite satisfying results with respect to the system’s po-
                                                                 tential to rank recommendations for users, we should not
Figure 5: Cut-out of validation interface for holiday home       forget that many aspects play a role in the decision-making
recommendation rating. The lines in orange/blue contain          that cannot (yet) be detected from Facebook profiles. When
the matched tags.                                                choosing either a greeting card, a holiday home, or anything
                                                                 else, one will always look at domain-specific item charac-
In test runs of the validation process, we determined that       teristics. For a greeting card, the user looks at colors, style,
in a set-wise comparison of the two systems, users tended        and the occasion the card is sent for. Similarly, for a holiday
to prefer the set that was spread out over the map, rather       home, he looks at price, number of beds, the picture of the
than one that contained clusters of recommendations. Since       home, and the distance to the beach. For this reason, this
Inverted IBRS is extremely spread out, due to the fact that      approach shall only be used as a feature of a larger system.
items had no relation with the users or each other, this
caused a bias in the validation results. Therefore, we de-
                                                                  Relative frequency


                                                                                                                              Relative frequency
cided to only compare the results item-wise. Furthermore,                              0.4                                                         0.4
we removed tags with a negative connotation, such as “die,”                            0.3                                                         0.3
or “death.”
                                                                                       0.2                                                         0.2
                                                                                       0.1                                                         0.1
5.3   Validation results                                                                0                                                           0
The first two batches of the validation were used to deter-                                  1   2     3 4       5                                       1   2     3 4       5
                                                                                                     Rating                                                      Rating
mine the potential of the IBRS ranking mechanism. The
                                                                  (a) With recommendation      (b) Without recommenda-
results are shown in the pie charts of Figure 6. Figure 6a        explanation; average rat-    tion explanation; average
shows which system was the test user’s preferred system,          ing: 3.3772.                 rating: 3.4709.
based on a majority vote between the two systems. Most           Figure 7: Recommendation ratings split out by recommen-
users participated in the validation of both the recommen-       dation presentation interface
dation of greeting cards and holiday homes. Each batch was
counted separately. 47% of the users preferred IBRS, 22%
voted equally often for both of the systems, and 31% of the
users preferred Inverted IBRS. In the pie chart of Figure 6b,    6.                      CONCLUSION
the results are shown when the results of holiday homes          In this paper, we presented the approach behind IBRS. We
with the greeting cards are combined per user. Since this        discussed the concept of mapping items marked as preferred
increases the number of votes per user, ties are less common.    or liked in social media onto a generic knowledge-base, and
In this scenario, 55% of the users preferred the IBRS results,   query expansion using DBpedia. We presented the tech-
while 34% preferred Inverted IBRS.                               nology, including the abstraction layer, tag generation ap-
                                                                 proach, and ranking mechanism. We also presented the val-
The final batch of the validation was used to determine the      idation results of a test user group. As said, we recommend
usefulness of the proposed recommendation explanation in-        to use the proposed and validated approach from this pa-
terface for holiday homes. The results of this batch are         per as a feature of a larger recommender system. In a more
shown in the histograms of Figure 7. Contrary to our expec-      complete system, one also needs to take domain-specific fea-
tations, users preferred to receive recommendations without      tures, as well as item popularity and other collaborative fil-
explanations. Using the 5-point Likert scale, the users who      tering features, into account. However, these features would
were presented with an interface with explanations rated         contradict with our objective to create a generic RS that
overcomes the cold-start problem, and therefore were not             Proceedings of the 8th international conference on
taken into account in this work.                                     Intelligent user interfaces, pp. 263–266, ACM, 2003.
                                                                [12] V. C. Ostuni, T. Di Noia, R. Mirizzi, D. Romito, and
Currently, IBRS uses all paths in the knowledge base graph           E. Di Sciascio, “Cinemappy: a context-aware mobile
as an indication for a useful recommendation. However,               app for movie recommendations boosted by DBpedia,”
some paths in the graph actually form a reason not to rec-           SeRSy, vol. 919, pp. 37–48, 2012.
ommend that item. For example, in the holiday home do-          [13] P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos,
main, a user is less likely to book a home in his own town,          “Moviexplain: a recommender system with
even though there may be many paths between him and                  explanations,” in Proceedings of the third ACM
that holiday home based on his local likes. Furthermore,             conference on Recommender systems, pp. 317–320,
some nodes are more useful than other for recommendation.            ACM, 2009.
DBpedia nodes like “European Central Time” have a lot of        [14] B. Heitmann and C. Hayes, “Using linked data to
incoming paths, while it is unlikely that this actually forms        build open, collaborative recommender systems.,” in
an interest for this user. The next step for IBRS is to fur-         AAAI spring symposium: linked data meets artificial
ther improve the ranking mechanism by incorporating these            intelligence, pp. 76–81, 2010.
characteristics and explore the possibility to automatically    [15] T.-P. Liang, Y.-F. Yang, D.-N. Chen, and Y.-C. Ku,
detect (negative) weights of paths.                                  “A semantic-expansion approach to personalized
                                                                     knowledge recommendation,” Decision Support
7.   ACKNOWLEDGEMENTS                                                Systems, vol. 45, no. 3, pp. 401–412, 2008.
This publication was supported by the Dutch national pro-       [16] S. Bostandjiev, J. O’Donovan, and T. Höllerer,
gram COMMIT/. We also thank Mena Habib for his sup-                  “TasteWeights: a visual interactive hybrid
port in the tag generation process.                                  recommender system,” in Proc. of the 6th ACM conf.
                                                                     on Recommender systems, pp. 35–42, ACM, 2012.
8.   REFERENCES                                                 [17] R. Burke, “Hybrid web recommender systems,” in The
 [1] Facebook, “Facebook | photos.”                                  adaptive web, pp. 377–408, Springer, 2007.
     https://www.facebook.com/facebook, 2013.                   [18] P. Lops, “Semantics-aware content-based recommender
 [2] S. Bakers, “Statistics of the top facebook pages.”              systems,” 10 2014. Keynote at Workshop on New
     http://www.socialbakers.com/statistics/                         Trends in Content-based Recommender Systems.
     facebook/pages/total/, 2013.                               [19] P. Basile, C. Musto, M. de Gemmis, P. Lops,
 [3] J. Bobadilla, F. Ortega, A. Hernando, and                       F. Narducci, and G. Semeraro, “Content-based
     A. Gutiérrez, “Recommender systems survey,”                    recommender systems + DBpedia knowledge =
     Knowledge-Based Systems, vol. 46, pp. 109–132, 2013.            semantics-aware recommender systems,” in Semantic
 [4] D. Fijalkowski and R. Zatoka, “An architecture of a             Web Evaluation Challenge, pp. 163–169, Springer,
     web recommender system using social network user                2014.
     profiles for e-commerce,” in Computer Science and          [20] C. Shi, C. Zhou, X. Kong, P. S. Yu, G. Liu, and
     Information Systems (FedCSIS), 2011 Federated                   B. Wang, “HeteRecom: A semantic-based
     Conference on, pp. 287–290, IEEE, 2011.                         recommendation system in heterogeneous networks,”
 [5] I. Guy, N. Zwerdling, I. Ronen, D. Carmel, and                  in Proceedings of the 18th ACM SIGKDD
     E. Uziel, “Social media recommendation based on                 international conference on Knowledge discovery and
     people and tags,” in Proc. of the 33rd intern. ACM              data mining, pp. 1552–1555, ACM, 2012.
     SIGIR conference on Research and development in            [21] C. Shi, X. Kong, Y. Huang, S. Y. Philip, and B. Wu,
     information retrieval, pp. 194–201, ACM, 2010.                  “HeteSim: A general framework for relevance measure
 [6] J. He and W. W. Chu, A social network-based                     in heterogeneous networks,” IEEE Transactions on
     recommender system (SNRS). Springer, 2010.                      Knowledge & Data Engineering, no. 10,
 [7] A. Passant, “dbrec - music recommendations using                pp. 2479–2492, 2014.
     DBpedia,” in The Semantic Web–ISWC 2010,                   [22] V. Zanardi and L. Capra, “Social ranking: uncovering
     pp. 209–224, Springer, 2010.                                    relevant content using tag-based recommender
 [8] A. Passant and Y. Raimond, “Combining social music              systems,” in Proceedings of the 2008 ACM conference
     and semantic web for music-related recommender                  on Recommender systems, pp. 51–58, ACM, 2008.
     systems,” in The 7th International Semantic Web            [23] M. B. Habib and M. van Keulen, “Improving toponym
     Conference, p. 19, Citeseer, 2008.                              disambiguation by iteratively enhancing certainty of
 [9] R. Mirizzi, T. Di Noia, A. Ragone, V. C. Ostuni, and            extraction,” in Proceedings of the 4th International
     E. Di Sciascio, “Movie recommendation with                      Conference on Knowledge Discovery and Information
     DBpedia,” in IIR, pp. 101–112, Citeseer, 2012.                  Retrieval, KDIR 2012, Barcelona, Spain, (Spain),
[10] J. Golbeck and J. Hendler, “Filmtrust: Movie                    pp. 399–410, SciTePress, October 2012.
     recommendations using trust in web-based social            [24] DBpedia, “SPARQL explorer for
     networks,” in Proceedings of the IEEE Consumer                  http://dbpedia.org/sparql.”
     communications and networking conference, vol. 96,              http://dbpedia.org/snorql/, 2015.
     pp. 282–286, University of Maryland, 2006.                 [25] V. de Graaff, M. van Keulen, and R. A. de By,
[11] B. N. Miller, I. Albert, S. K. Lam, J. A. Konstan, and          “Towards geosocial recommender systems,” in 4th
     J. Riedl, “MovieLens unplugged: experiences with an             Intern. Workshop on Web Intelligence & Communities
     occasionally connected recommender system,” in                  (WI&C 2012), Lyon, France, ACM, 2012.

</pre>