Generic knowledge-based analysis of
social media for recommendations
Victor de Graaff Anne van de Venis
Dept. of Computer Science Dept. of Computer Science
University of Twente University of Twente
Enschede, The Netherlands Enschede, The Netherlands
v.degraaff@utwente.nl a.j.vandevenis@student.utwente.nl
Maurice van Keulen Rolf A. de By
Dept. of Computer Science Fac. of Geo-Information Science
University of Twente & Earth Observation (ITC)
Enschede, The Netherlands University of Twente
m.vankeulen@utwente.nl Enschede, The Netherlands
r.a.deby@utwente.nl
ABSTRACT alone currently have a total of 5.87 billion facebook-likes [2].
Recommender systems have been around for decades to help The items that people express a preference for on social me-
people find the best matching item in a pre-defined item dia, whether through a like of a Facebook page, a follow on
set. Knowledge-based recommender systems are used to Twitter, or a tip on the renewed FourSquare, can be taken to
match users based on information that links the two, but disclose personal traits of interest and the things they want
they often focus on a single, specific application, such as to be associated with. This vast amount of information is
movies to watch or music to listen to. In this paper, we the starting point for our Interest-Based Recommender Sys-
present our Interest-Based Recommender System (IBRS). tem (IBRS).
This knowledge-based recommender system provides rec-
ommendations that are generic in three dimensions: IBRS But what people express their preference for on social media,
is (1) domain-independent, (2) language-independent, and cannot always directly be related to commonly used tags or
(3) independent of the used social medium. To match user words in descriptions in an existing item set. These items
interests with items, the first are derived from the user’s are often example instances of broader concepts. For exam-
social media profile, enriched with a deeper semantic em- ple: Cristiano Ronaldo has 103 million facebook-likes at the
bedding obtained from the generic knowledge base DBpe- time of writing, whereas Soccer (66 million) and Football
dia. These interests are used to extract personalized rec- (46 million) have considerably fewer facebook-likes.1 Tag
ommendations from a tagged item set from any domain, in sets or descriptions, on the other hand, are more likely to
any language. We also present the results of a validation of contain these broader concepts, as for example is the case
IBRS by a test user group of 44 people using two item sets in greeting cards, sports equipment, or campsites with soc-
from separate domains: greeting cards and holiday homes. cer fields. In fact, one of our validation item sets contains
tagged greeting cards with practically only generic terms
Keywords such as soccer/football. To bridge this generalization gap in
Recommender systems, knowledge-based, DBpedia, social a domain- and language-independent way, we use the mul-
media, domain-independent, language-independent tilingual, generic knowledge base DBpedia to automatically
detect broader concepts. We call these concepts the user’s
interests. In this paper, we validate our hypothesis that au-
General Terms tomated user interest detection can also be used to select
Algorithms, Design, Experimentation preferred items in an item set, independent of the item set
domain, language and used social medium. As a boundary
Categories and Subject Descriptors requirement to our solution, the cold-start problem, as for
H.4.2 [Information Systems Applications]: Types of example discussed by Bobadilla et al. [3], needs to be circum-
Systems—Decision support vented. The system we propose shall be seen as a feature
of a larger recommender system, either to bootstrap or to
1. INTRODUCTION support that system, rather than as a stand-alone system.
The aim of a recommender system (RS) is to help people
In addition to the recommendation approach we propose in
find the items they are most interested in. A requirement
this paper, we also present the results of a validation thereof.
to provide personalized recommendations is that the RS has
A user group of 44 people tested our RS, using item sets
knowledge of the person using it. In 2013, Facebook claimed
from two completely different domains: greeting cards and
to have 1.11 billion active users [1], and the top-100 pages
CBRecSys 2015, September 20, 2015, Vienna, Austria. 1
Synonyms like this one cause problems as well, and are
Copyright remains with the authors and/or original copyright holders discussed in more detail in Section 3
holiday homes. Both the recommendation selection, as well [18, 19].
as the explanation interface were validated by these users,
using their own social media profile. Our work is inspired by Shi et al.’s HeteRecom [20], which is
based on the similarity calculation HeteSim [21]. Similar to
This paper is further structured as follows: related work is their work, our ultimate goal is to find the matching paths
discussed in Section 2, the motivation behind this research is between a user and the item set that carry the most weight.
discussed in Section 3, the IBRS technology is presented in In this paper however, we focus on the detection of existing
Section 4, while the validation approach and results are laid paths.
out in Section 5, and Section 6 finally contains concluding
remarks and hints at future work. 3. MOTIVATION
In this work, we aim to extract recommendations that are
2. RELATED WORK generic in three dimensions: the recommendation approach
The creation of a RS that makes use of social media or DB- shall be independent of the item set domain, the item set lan-
pedia is not a new ambition. Social media have especially guage, and the used social medium. As a fourth criterium,
received much attention in the field of content-based recom- it shall not suffer from any of Bobadilla’s three cold-start
mender systems. Fijalkowski and Zatoka presented an archi- problem categories. Below, we discuss the motivation for all
tecture of a recommender system for e-commerce based on of these challenges:
Facebook profiles [4]. Guy et al. proposed five recommender
types, based on social media and/or tags [5]. In their ap- Domain-independence
proach, they also presented the users with recommendation As discussed in the previous section, currently most recom-
explanation. The social media they focus on however, are mender systems based on knowledge bases and social media
not of the mainstream type, but specific for the Lotus Con- are focused on one specific domain. Independence of the
nections suite. The system of He et al., on the other hand, item set domain only allows us to reuse the solution and its
uses common social media [6]. Whereas they claim to over- future improvements for multiple applications.
come the cold-start problem, their system appears to still
suffer from the new item cold-start problem, as described Language-independence
by Bobadilla et al. [3]. Similar to domain-independence as a requirement for reusabil-
ity, a language-independent solution improves the RS’s po-
The creation of a RS based on DBpedia has also received tential to be used in multiple applications. A sub-requirement
quite some attention already, especially in the field of mu- of of language-independence is synonym-independence. As
sic [7, 8] and movie [9, 10, 11, 12, 13] recommendation. Di Zanardi and Capra pointed out in [22], synonyms are a typ-
Noia et al. took it a step further and also benefited from the ical RS problem, especially for tag-based RSs. The example
integration of DBpedia in the linked open data (LOD) initia- of people facebook-liking either the Soccer page or the Foot-
tive. Their movie recommendations are not only based on ball page from Section 1 already showed that people may
DBpedia knowledge, but also on Freebase and LinkedMDB. facebook-like different pages, while referring to the same
A more generic approach to create a RS using LOD was done concept. Despite recent efforts by Facebook to merge pages
by Heitmann and Hayes [14], who use also use LOD to over- about the same topic from different languages into one page,
come the cold-start problem. Even though their validation and improving the search functionality to help people find-
is based on a music dataset, their approach has the generic- ing such pages while searching for their name in a different
ity to be used for other applications as well. Our approach language, still several pages exist to describe similar con-
for broader concept detection through DBpedia is a form cepts.
of knowledge-based query expansion. Liang et al. already
showed in [15] that document recommendation based on the
user’s interests improves as a result of query expansion, or Social medium-independence
semantic-expansion as they call it. From the first form of genericity, domain-independence, fol-
lows another requirement. Several social media, such as
What distinguishes our approach from other RS research, Facebook, LinkedIn, Twitter, Instagram, and Pinterest, are
is that we use both social media profiles and DBpedia data widely used, and each of these has its own focus. When one
to create a generic RS. Passant and Raimond, for exam- decides to create a RS for job vacancies, LinkedIn may be a
ple, created a RS based on exported social media profiles more logical social medium to base the recommendations on
and DBpedia data in [8], but their approach is limited to than any of the other, while a RS for touristic hotspots will
the music-specific relations in DBpedia. To the best of our most likely lead to another choice. Therefore, to create a RS
knowledge, the only other generic approach is TasteWeights based on social media content that is domain-independent,
by Bostandjiev et al. [16]. They build a user profile based on it shall also be independent of the underlying social medium.
social media data, and then apply a collaborative filtering-
based approach to select recommendations. This still implies
all of the three cold-start problem categories: new item, new Cold-start problem
user, and new community, again as described by Bobadilla et The cold-start problem has been widely discussed in RS
al. [3]. As it is exactly our goal to overcome the cold-start literature. Bobadilla et al. categorized it into three sub-
problem, our approach is a hybrid between content-based categories: the new item problem, the new user problem,
and knowledge-based, according to the RS classification by and the new community problem [3]. Knowledge-based RS
Burke and Ramezani [17]. Basile, Lops et al. would classify have been designed to overcome all of these problems, but
our work as a top-down semantics-aware content-based RS often require domain-specific knowledge.
Colosseum Colosseum Vespasian Rome Rome
Pizza Pizza Calzone Italy Italy
Francesco Francesco A.S. Stadio Stadio
Totti Totti Roma Olimpico Olimpico
Figure 1: The IBRS concept, illustrated using the holiday home domain. A user’s preferred items on social media are mapped
onto knowledge base resources. Broader concepts are detected by exploring the knowledge base graph, and finally mapped
onto tags in the item set database.
Overcoming all of these four challenges at the same time its second neighbor C.2 In Table 1, we show the top-10 of
has motivated us to create IBRS: a domain-independent, second neighbors when traversing the DBpedia graph start-
language-independent, social medium-independent, knowl- ing from the Eiffel Tower as node A, using all four possible
edge-based RS. direction combinations. DBpedia pages in italics also oc-
cur as tags in at least one of our two validation sets, which
4. CONCEPT & TECHNOLOGY are discussed in detail in Section 5. The first approach,
A → B → C, leads to results describing France, influen-
The foundation of IBRS is the idea that people are more
tial French people, and several other buildings in France.
likely to be interested in items that have a not too distant
The second approach, A ← B → C, has some overlap with
relation with things we know they like. Although things
the first approach, but also contains several results unre-
people express a preference for on social media are typically
lated to France, such as Los Angeles and the United States.
in a different domain than our item set, they may still give
The third approach, A ← B ← C, shows some remarkable
hints towards a person’s interests. In IBRS, we link the
buildings throughout Europe, but also very unrelated lists
preferred items on social media to resources in the DBpedia
towards the bottom of the top-10. The fourth and final ap-
Resource Description Framework (RDF) graph. We use this
proach, A → B ← C, results in several famous French peo-
graph to explore related concepts, which are then matched
ple, especially scientists. Other starting points show similar
with a known tag set, that is used to label the item set. As
results: the third approach, A ← B ← C, shows promising
a final step, we rank the item set based on the number of
results for single domain recommendations, whereas the first
matched tags. This concept is illustrated, using the holi-
approach shows the best results for broader concept detec-
day home domain, in Figure 1. In this example, the user
tion. Since our aim is to match these second neighbors with
facebook-liked the Colosseum, pizza, and Francesco Totti.
a tag set, we use the first approach, A → B → C.
These facebook-likes are mapped onto DBpedia, and the
DBpedia RDF graph is explored to detect the broader con-
cepts Rome, Italy, and Stadio Olimpico. These items are 4.2 Abstraction layer data model
mapped onto holiday home tags, to ultimately match the To ensure IBRS genericity, an abstraction layer is used on
user with a specific holiday home. top of the underlying data source, such as a product database.
This abstraction layer can consist of physical tables, views,
The remainder of this section is structured as follows: RDF or a mix thereof, but we will refer to its items as tables from
graph exploration is discussed in Section 4.1. The data here on. The abstraction layer contains two entity tables:
model of the IBRS abstraction layer is presented in Sec- abstract items and tags, and one relationship table: ab-
tion 4.2. Section 4.3 presents a method for automated tag stract items tags, as depicted in Figure 2.
generation from descriptions. In Section 4.4 the ranking
mechanism and Facebook-DBpedia mapping approach are
presented. Section 4.5, finally, presents a short introduction
of the IBRS prototype.
4.1 DBpedia graph exploration Figure 2: Abstraction layer data model
After matching a facebook-like with a DBpedia resource, we
traverse the RDF graph in exactly two steps. Since RDF The abstract items table contains the id and object type
tuples have a subject, predicate and object, RDF graphs are 2
Depending on the directions of the relationships, and the
directed. Therefore, there are four possible different direc- existence of bi-directional relationships, node A may be
tion combinations to travel from node A through node B to equal to node C, as can also be seen in Table 1.
Rank A → B → C (#) A ← B → C (#) A ← B ← C (#) A → B ← C (#)
1 Paris (20) Eiffel Tower (41) Eiffel Tower (7) Paul Langevin (51)
2 France (20) France (17) Palácio de Ferro (3) Léon Foucault (48)
3 Eiffel Tower (7) Paris (15) Cologne Cathedral (2) Jean Témerson (48)
4 Manuel Valls (6) Los Angeles (4) Eiffel Bridge, Ungheni (2) Frédéric Passy (45)
5 François Hollande (6) British Library (4) Souleuvre Viaduct (2) L.A. de Bougainville* (45)
6 Unitary state (6) Bonnétable (4) Samuel Hibben (2) Cecile de Brunhoff (45)
7 French language (6) Aarhus University (4) Casa de Fierro (2) Adrien-Marie Legendre (45)
8 Anne Hidalgo (6) Garabit viaduct (4) Modern Marvels episodes* (2) Robert Perrier (45)
9 Bonnétable (4) St Paul’s Cathedral (4) Monopoly editions USA* (2) Paul Lévy (math.)* (45)
10 Garabit viaduct (4) United States (4) Garabit viaduct (2) Émile Drain (45)
Table 1: Top-10 of second neighbor nodes C through DBpedia graph exploration in multiple directions for the Eiffel Tower
resource as node A. Numbers between brackets indicate number of paths between that node and the Eiffel Tower node. Items
in italics also occur as tags in at least one of our two validation tag sets. Items marked with an asterisk are abbreviated.
of the items in the item set. The object type field allows APIs are the same. Some social medium APIs allow devel-
us to use one IBRS instance for the recommendation of mul- opers to find out what a user’s friends prefer, while others
tiple item sets. limit the developer to information about the logged in user.
Therefore, when using the Facebook Graph API, we lim-
The tags table contains the tag’s id, name, and dbpe- ited ourselves to the name and category elements of each
dia resource id. The name field can be used in the lan- facebook-liked page.
guage of the item set tags. Since we have one item set that
is tagged in Dutch, and one item set that is tagged in En- Matching social media items with DBpedia resources
glish, we added the name eng field for English tags. The Facebook-likes are mapped onto DBpedia resources through
dbpedia resource id is cached in the database for better their name. Those facebook-pages that mapped onto am-
performance. biguous terms in DBpedia were filtered out. To create a
more complete mapping, we used the category element to
The abstract items tags table is a regular relation table postfix the name of those pages pages for which the cat-
containing the abstract item id and tag id. It also con- egory element was filled with “movie,” “tv show,” or “mu-
tains the abstract item type for improved join executions. sician/band.” In these cases, we also checked if a page ex-
ists with the additional suffix “ (movie),” “ (TV series),” or
4.3 Tag generation “ (band)” respectively. This leads to the following SPARQL
In case an item set is not tagged, but does contain descrip- query:
tive texts, tags can be extracted automatically. Natural lan-
guage processing algorithms can be used for this purpose,
such as the named entity extraction and disambiguation ap- PREFIX dbpont:
PREFIX dbpres:
proach by Habib et al. [23]. We used Habib’s approach with # We use the prefixed versions here for readability
a manually trained model to extract named entities from
holiday home descriptions. A drawback of this approach SELECT ?uri ?label
is that descriptions are often the result of free-text input. WHERE {
Phrases such as “only a 3 hour flight from Amsterdam” or # Find exact match with category suffix
“25 kilometers from the border with France” led to correctly { ?uri dbpont:wikiPageID [].
FILTER(?uri = dbpres:The_Net_(movie)) }
extracted named entities, but semantically not the best tags
to distinguish this object from others. Therefore, we addi- # Or exact match without category suffix
tionally removed those tags that tagged a holiday home with UNION { ?uri dbpont:wikiPageID [].
another country than the one it is located in. In total, this FILTER(?uri = dbpres:The_Net) }
approach allowed us to assign 455,777 (non-unique) tags to
42,148 holiday homes, from which 106,430 tags (of which # Or the label version
UNION {?uri rdfs:label "The_Net"@en.}
12,151 unique) could be mapped onto a DBpedia resource.
# Check if page has redirect
4.4 Ranking UNION { dbpres:The_Net_(movie)
The IBRS ranking method consists of four steps: (1) retriev- dbpont:wikiPageRedirects ?uri}
UNION { dbpres:The_Net
ing preferred items from social media, (2) matching these dbpont:wikiPageRedirects ?uri}
items with DBpedia resources, (3) extracting abstracts from
DBpedia, (4) ranking items based on matched tags. For per- ?uri rdfs:label ?label.
formance reasons, several items are cached offline. ?uri dbpont:wikiPageID ?wikiPageid.
FILTER (langMatches(lang(?label),"en")).
Obtaining preferred items from social media # Filter out ambiguous terms
To map social media items while remaining independent of FILTER NOT EXISTS { ?uri
the social medium, we must take into account that not all dbpont:wikiPageDisambiguates ?disambiguates } .
still consider the concept itself domain-independent. This
# Filter out Wikipedia categories in contrast to for example music recommenders that rely on
MINUS {?uri rdf:type skos:Concept} the artist-song relationship.
}
LIMIT 1
4.5 Prototype
For demonstration and validation purposes, we have created
Using this approach on a test set of 11,674 unique Facebook a prototype of IBRS, using the Cake PHP platform. The
pages, obtained from the likes of 309 users, we were able prototype can be used with either one’s own Facebook pro-
to match 2,240 (19.2%) Facebook-pages with a DBpedia re- file, or by manually combining several DBpedia resources.
source. It can be accessed through http://ibrs.ewi.utwente.nl.
Extracting abstracts from DBpedia 5. VALIDATION
For all matched DBpedia resources, the abstracts are re- To validate our ranking mechanism, as well as to deter-
trieved from the SPARQL endpoint provided by DBpedia mine the user perception of recommendations with explana-
[24] using the following query: tions, we validated IBRS in a carefully designed user study
with a test user group of 44 people. We used two prod-
uct sets from different domains to demonstrate its domain-
PREFIX dbpont: independence: greeting cards and holiday homes. The greet-
PREFIX dbpres:
ing card set contains Dutch tags, while the holiday homes
SELECT DISTINCT did not contain any tags, but only descriptions. From the
?o3 (count(?o3) as ?count) ?abstract ?label holiday homes, we used the English descriptions to extract
(English) tags, to emphasize the potential to use IBRS in a
WHERE { language-independent way.
# UNION concatenation of mapped FB pages
{dbpres:Vienna ?p1 ?o2} UNION
This section is further structured as follows: Section 5.1
{dbpres:Recommender_system ?p1 ?o2} UNION
{dbpres:Computer_science ?p1 ?o2} describes the item set details. In Section 5.2, we present
the approach taken to validate both our ranking mechanism
# Neighboring object has Wikipage and the recommendation explanation interface. Section 5.3
?o2 dbpont:wikiPageID ?o2id ; finally, discusses the validation results.
# Neighboring object has neighbor
?p2 ?o3 . 5.1 Item set details
The first item set contains greeting cards from the Dutch
# Second neighbor object has Wikipage company Kaartje2Go (“Card2Go”). People search through
?o3 dbpont:wikiPageID ?o3id ;
dbpont:abstract ?abstract ;
a collection of cards electronically, which are distributed
rdfs:label ?label . through regular (non-electronic) mail by Kaartje2Go in name
of the customer. The customers can choose between sending
# English is used as an example greeting cards to one or multiple people at once. 75% of the
FILTER(langMatches(lang(?abstract), ’en’)) . purchases are of the latter type, for which the preferences of
FILTER(langMatches(lang(?label), ’en’)) . the sender are more relevant than those of the (potentially
# Second neighbor object must not be a category
many) recipients. To facilitate the search, users can search
MINUS {?o3 rdf:type skos:Concept} for tags that have been entered manually by the Kaartje2Go
} employees. These tags, which are mostly in Dutch, are in-
consistent in their completeness: for example some of the
# ‘Only’ the 1000 most important abstracts soccer cards are also tagged using the names of popular
ORDER BY DESC(?count) Dutch soccer teams, but not all of them. Less popular teams
LIMIT 1000
are never mentioned as tags. The top-10 of the translated
greeting card tags can be found in Table 2.
Ranking items based on matched tags
Each tag that (1) has a dbpedia resource id and (2) is The second item set contains holiday homes from the hol-
contained in at least one of the downloaded abstracts, is iday home portal EuroCottage. This item set did not con-
marked as a matched tag. The item set is then ranked on the tain tags, but a description in one, two or three languages
basis of the number of matched tags. As a final step, those (Dutch, English and/or German). We followed the approach
items that are too close to a higher ranked item, based on a discussed in Section 4.3 to extract mentions of geographic
pre-defined distance function, are removed from the ranking. places from the English holiday home descriptions. The
This last step is added to ensure diversity among the recom- top-10 of resulting tags can be found in Table 3. The advan-
mended items. For the recommendation of geographic ob- tage of extracting geographic places is that these also often
jects, as for example in a geo-social RS like the one discussed have Wikipedia pages, which makes them suitable for the re-
in [25], one can think of the Euclidean distance, but for quirement that the tags need to have a dbpedia resource id.
more generic purposes the cosine similarity (as for example Many pages of the holiday home descriptions were in Ger-
discussed in [22]) of the item’s tags may be a good starting man, even though they were entered into the system by the
point. The tag input makes our RS domain-aware. However, holiday home owners as English descriptions. As a result
since the approach can be applied to any tag domain, we thereof, many German words or phrases were extracted as
Tag Frequency to validate our ranking mechanism, the third batch was in-
Birthday 7,535 tended to determine the user perception of recommendations
Party 4,200 with explanations, as compared to recommendations with-
Love 2,521 out explanations.
Girl 2,268
Boy 2,084 For the first ten questions, users were asked to select their
Infant 2,056 favorite greeting card from a greeting card pair using the
Photograph 1,793 interface of Figure 3. On one side of the screen, an item from
Marriage 1,543 the top-10 greeting cards according to IBRS was shown. On
Cool 1,381 the other side, a card was shown that was not tagged with
Animals 1,373 any of the matched tags. We called these recommendations
Inverted IBRS. IBRS and Inverted IBRS were shown on the
Table 2: Top-10 of (translated) manual greeting card tags left or right side at random.
with a DBpedia reference, ordered by the number of cards
with this tag
geographical references, since the model was trained for En-
glish descriptions. However, the impact of these terms was
practically zero, as these extracted tags were not matched
with an English DBpedia resource.3 For the validation, the
holiday homes were plotted on a map that was zoomed in
on Europe, since most holiday homes in the set are located Figure 3: Validation interface for greeting card comparison
there. A relatively small subset of homes outside Europe
could therefore not be displayed on the map, and were re- For the second batch of ten questions, our test users were
moved from the validation set, just as those without a coor- presented with the choice between two holiday homes, in a
dinate pair. This coordinate pair was also used for the di- similar way. Again, IBRS and Inverted IBRS were shown
versity function: all top-10 holiday homes had to be located on the left or right side at random. For each holiday home,
at least 250 kilometers away from higher ranked items. its location was shown on a map, with the name of the holi-
day home and the first 1000 characters of its description, as
Tag Frequency shown in Figure 4.
Florence 760
Siena 656
Mediterranean Sea 634
Tuscany 537
Legoland 513
Venice 508
Sotkamo 448
Europe 440
Ardennes 421
Pisa 363
Table 3: Top-10 of extracted tags for holiday homes with a
DBpedia reference, ordered by the number of holiday homes
with this tag
5.2 Validation approach
Our test users were requested to participate through Face-
book, and used their own existing Facebook account for the
recommendations. The test users were not aware of what
they were testing, except for the information that they were
testing a RS. Most test users do not have a background in
computer science, and none of them were aware of how IBRS
works. We asked our test users to validate our algorithm
through a total of 30 questions, split up into three batches Figure 4: Validation interface for holiday home comparison
of 10. Once a question had been answered, users could not
return to that question. The first two batches were intended
The final batch of ten questions required the test users to
3
Even though the approach can be applied to any language rate a recommendation. Each of the holiday homes was one
contained in the knowledge base, the tags are still matched of the top-10 holiday homes according to IBRS. At random,
with knowledge base resources in the tag language. a user was assigned to the group of users who received rec-
ommendations with an explanation, as shown in Figure 5, Inverted IBRS
Inverted IBRS
(31%)
or without an explanation. IBRS
(34%)
(47%)
IBRS
(55%)
Tie Tie
(22%) (11%)
(a) Split out between (b) Overall (batches com-
greeting cards and holiday bined)
homes (batches counted
separately)
Figure 6: Most frequent choices per user for the first two
batches of questions
the recommendations with an average score of 3.3772, while
users without recommendation explanation rated the recom-
mendations with a 3.4709 on average. From this validation,
we can conclude that people that receive recommendations
based on tags that do not describe them well, are more likely
to reject a recommendation with a “strongly disagree,” when
they see the rationale behind the recommendation.
Despite satisfying results with respect to the system’s po-
tential to rank recommendations for users, we should not
Figure 5: Cut-out of validation interface for holiday home forget that many aspects play a role in the decision-making
recommendation rating. The lines in orange/blue contain that cannot (yet) be detected from Facebook profiles. When
the matched tags. choosing either a greeting card, a holiday home, or anything
else, one will always look at domain-specific item charac-
In test runs of the validation process, we determined that teristics. For a greeting card, the user looks at colors, style,
in a set-wise comparison of the two systems, users tended and the occasion the card is sent for. Similarly, for a holiday
to prefer the set that was spread out over the map, rather home, he looks at price, number of beds, the picture of the
than one that contained clusters of recommendations. Since home, and the distance to the beach. For this reason, this
Inverted IBRS is extremely spread out, due to the fact that approach shall only be used as a feature of a larger system.
items had no relation with the users or each other, this
caused a bias in the validation results. Therefore, we de-
Relative frequency
Relative frequency
cided to only compare the results item-wise. Furthermore, 0.4 0.4
we removed tags with a negative connotation, such as “die,” 0.3 0.3
or “death.”
0.2 0.2
0.1 0.1
5.3 Validation results 0 0
The first two batches of the validation were used to deter- 1 2 3 4 5 1 2 3 4 5
Rating Rating
mine the potential of the IBRS ranking mechanism. The
(a) With recommendation (b) Without recommenda-
results are shown in the pie charts of Figure 6. Figure 6a explanation; average rat- tion explanation; average
shows which system was the test user’s preferred system, ing: 3.3772. rating: 3.4709.
based on a majority vote between the two systems. Most Figure 7: Recommendation ratings split out by recommen-
users participated in the validation of both the recommen- dation presentation interface
dation of greeting cards and holiday homes. Each batch was
counted separately. 47% of the users preferred IBRS, 22%
voted equally often for both of the systems, and 31% of the
users preferred Inverted IBRS. In the pie chart of Figure 6b, 6. CONCLUSION
the results are shown when the results of holiday homes In this paper, we presented the approach behind IBRS. We
with the greeting cards are combined per user. Since this discussed the concept of mapping items marked as preferred
increases the number of votes per user, ties are less common. or liked in social media onto a generic knowledge-base, and
In this scenario, 55% of the users preferred the IBRS results, query expansion using DBpedia. We presented the tech-
while 34% preferred Inverted IBRS. nology, including the abstraction layer, tag generation ap-
proach, and ranking mechanism. We also presented the val-
The final batch of the validation was used to determine the idation results of a test user group. As said, we recommend
usefulness of the proposed recommendation explanation in- to use the proposed and validated approach from this pa-
terface for holiday homes. The results of this batch are per as a feature of a larger recommender system. In a more
shown in the histograms of Figure 7. Contrary to our expec- complete system, one also needs to take domain-specific fea-
tations, users preferred to receive recommendations without tures, as well as item popularity and other collaborative fil-
explanations. Using the 5-point Likert scale, the users who tering features, into account. However, these features would
were presented with an interface with explanations rated contradict with our objective to create a generic RS that
overcomes the cold-start problem, and therefore were not Proceedings of the 8th international conference on
taken into account in this work. Intelligent user interfaces, pp. 263–266, ACM, 2003.
[12] V. C. Ostuni, T. Di Noia, R. Mirizzi, D. Romito, and
Currently, IBRS uses all paths in the knowledge base graph E. Di Sciascio, “Cinemappy: a context-aware mobile
as an indication for a useful recommendation. However, app for movie recommendations boosted by DBpedia,”
some paths in the graph actually form a reason not to rec- SeRSy, vol. 919, pp. 37–48, 2012.
ommend that item. For example, in the holiday home do- [13] P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos,
main, a user is less likely to book a home in his own town, “Moviexplain: a recommender system with
even though there may be many paths between him and explanations,” in Proceedings of the third ACM
that holiday home based on his local likes. Furthermore, conference on Recommender systems, pp. 317–320,
some nodes are more useful than other for recommendation. ACM, 2009.
DBpedia nodes like “European Central Time” have a lot of [14] B. Heitmann and C. Hayes, “Using linked data to
incoming paths, while it is unlikely that this actually forms build open, collaborative recommender systems.,” in
an interest for this user. The next step for IBRS is to fur- AAAI spring symposium: linked data meets artificial
ther improve the ranking mechanism by incorporating these intelligence, pp. 76–81, 2010.
characteristics and explore the possibility to automatically [15] T.-P. Liang, Y.-F. Yang, D.-N. Chen, and Y.-C. Ku,
detect (negative) weights of paths. “A semantic-expansion approach to personalized
knowledge recommendation,” Decision Support
7. ACKNOWLEDGEMENTS Systems, vol. 45, no. 3, pp. 401–412, 2008.
This publication was supported by the Dutch national pro- [16] S. Bostandjiev, J. O’Donovan, and T. Höllerer,
gram COMMIT/. We also thank Mena Habib for his sup- “TasteWeights: a visual interactive hybrid
port in the tag generation process. recommender system,” in Proc. of the 6th ACM conf.
on Recommender systems, pp. 35–42, ACM, 2012.
8. REFERENCES [17] R. Burke, “Hybrid web recommender systems,” in The
[1] Facebook, “Facebook | photos.” adaptive web, pp. 377–408, Springer, 2007.
https://www.facebook.com/facebook, 2013. [18] P. Lops, “Semantics-aware content-based recommender
[2] S. Bakers, “Statistics of the top facebook pages.” systems,” 10 2014. Keynote at Workshop on New
http://www.socialbakers.com/statistics/ Trends in Content-based Recommender Systems.
facebook/pages/total/, 2013. [19] P. Basile, C. Musto, M. de Gemmis, P. Lops,
[3] J. Bobadilla, F. Ortega, A. Hernando, and F. Narducci, and G. Semeraro, “Content-based
A. Gutiérrez, “Recommender systems survey,” recommender systems + DBpedia knowledge =
Knowledge-Based Systems, vol. 46, pp. 109–132, 2013. semantics-aware recommender systems,” in Semantic
[4] D. Fijalkowski and R. Zatoka, “An architecture of a Web Evaluation Challenge, pp. 163–169, Springer,
web recommender system using social network user 2014.
profiles for e-commerce,” in Computer Science and [20] C. Shi, C. Zhou, X. Kong, P. S. Yu, G. Liu, and
Information Systems (FedCSIS), 2011 Federated B. Wang, “HeteRecom: A semantic-based
Conference on, pp. 287–290, IEEE, 2011. recommendation system in heterogeneous networks,”
[5] I. Guy, N. Zwerdling, I. Ronen, D. Carmel, and in Proceedings of the 18th ACM SIGKDD
E. Uziel, “Social media recommendation based on international conference on Knowledge discovery and
people and tags,” in Proc. of the 33rd intern. ACM data mining, pp. 1552–1555, ACM, 2012.
SIGIR conference on Research and development in [21] C. Shi, X. Kong, Y. Huang, S. Y. Philip, and B. Wu,
information retrieval, pp. 194–201, ACM, 2010. “HeteSim: A general framework for relevance measure
[6] J. He and W. W. Chu, A social network-based in heterogeneous networks,” IEEE Transactions on
recommender system (SNRS). Springer, 2010. Knowledge & Data Engineering, no. 10,
[7] A. Passant, “dbrec - music recommendations using pp. 2479–2492, 2014.
DBpedia,” in The Semantic Web–ISWC 2010, [22] V. Zanardi and L. Capra, “Social ranking: uncovering
pp. 209–224, Springer, 2010. relevant content using tag-based recommender
[8] A. Passant and Y. Raimond, “Combining social music systems,” in Proceedings of the 2008 ACM conference
and semantic web for music-related recommender on Recommender systems, pp. 51–58, ACM, 2008.
systems,” in The 7th International Semantic Web [23] M. B. Habib and M. van Keulen, “Improving toponym
Conference, p. 19, Citeseer, 2008. disambiguation by iteratively enhancing certainty of
[9] R. Mirizzi, T. Di Noia, A. Ragone, V. C. Ostuni, and extraction,” in Proceedings of the 4th International
E. Di Sciascio, “Movie recommendation with Conference on Knowledge Discovery and Information
DBpedia,” in IIR, pp. 101–112, Citeseer, 2012. Retrieval, KDIR 2012, Barcelona, Spain, (Spain),
[10] J. Golbeck and J. Hendler, “Filmtrust: Movie pp. 399–410, SciTePress, October 2012.
recommendations using trust in web-based social [24] DBpedia, “SPARQL explorer for
networks,” in Proceedings of the IEEE Consumer http://dbpedia.org/sparql.”
communications and networking conference, vol. 96, http://dbpedia.org/snorql/, 2015.
pp. 282–286, University of Maryland, 2006. [25] V. de Graaff, M. van Keulen, and R. A. de By,
[11] B. N. Miller, I. Albert, S. K. Lam, J. A. Konstan, and “Towards geosocial recommender systems,” in 4th
J. Riedl, “MovieLens unplugged: experiences with an Intern. Workshop on Web Intelligence & Communities
occasionally connected recommender system,” in (WI&C 2012), Lyon, France, ACM, 2012.