Generic knowledge-based analysis of social media for recommendations Victor de Graaff Anne van de Venis Dept. of Computer Science Dept. of Computer Science University of Twente University of Twente Enschede, The Netherlands Enschede, The Netherlands v.degraaff@utwente.nl a.j.vandevenis@student.utwente.nl Maurice van Keulen Rolf A. de By Dept. of Computer Science Fac. of Geo-Information Science University of Twente & Earth Observation (ITC) Enschede, The Netherlands University of Twente m.vankeulen@utwente.nl Enschede, The Netherlands r.a.deby@utwente.nl ABSTRACT alone currently have a total of 5.87 billion facebook-likes [2]. Recommender systems have been around for decades to help The items that people express a preference for on social me- people find the best matching item in a pre-defined item dia, whether through a like of a Facebook page, a follow on set. Knowledge-based recommender systems are used to Twitter, or a tip on the renewed FourSquare, can be taken to match users based on information that links the two, but disclose personal traits of interest and the things they want they often focus on a single, specific application, such as to be associated with. This vast amount of information is movies to watch or music to listen to. In this paper, we the starting point for our Interest-Based Recommender Sys- present our Interest-Based Recommender System (IBRS). tem (IBRS). This knowledge-based recommender system provides rec- ommendations that are generic in three dimensions: IBRS But what people express their preference for on social media, is (1) domain-independent, (2) language-independent, and cannot always directly be related to commonly used tags or (3) independent of the used social medium. To match user words in descriptions in an existing item set. These items interests with items, the first are derived from the user’s are often example instances of broader concepts. For exam- social media profile, enriched with a deeper semantic em- ple: Cristiano Ronaldo has 103 million facebook-likes at the bedding obtained from the generic knowledge base DBpe- time of writing, whereas Soccer (66 million) and Football dia. These interests are used to extract personalized rec- (46 million) have considerably fewer facebook-likes.1 Tag ommendations from a tagged item set from any domain, in sets or descriptions, on the other hand, are more likely to any language. We also present the results of a validation of contain these broader concepts, as for example is the case IBRS by a test user group of 44 people using two item sets in greeting cards, sports equipment, or campsites with soc- from separate domains: greeting cards and holiday homes. cer fields. In fact, one of our validation item sets contains tagged greeting cards with practically only generic terms Keywords such as soccer/football. To bridge this generalization gap in Recommender systems, knowledge-based, DBpedia, social a domain- and language-independent way, we use the mul- media, domain-independent, language-independent tilingual, generic knowledge base DBpedia to automatically detect broader concepts. We call these concepts the user’s interests. In this paper, we validate our hypothesis that au- General Terms tomated user interest detection can also be used to select Algorithms, Design, Experimentation preferred items in an item set, independent of the item set domain, language and used social medium. As a boundary Categories and Subject Descriptors requirement to our solution, the cold-start problem, as for H.4.2 [Information Systems Applications]: Types of example discussed by Bobadilla et al. [3], needs to be circum- Systems—Decision support vented. The system we propose shall be seen as a feature of a larger recommender system, either to bootstrap or to 1. INTRODUCTION support that system, rather than as a stand-alone system. The aim of a recommender system (RS) is to help people In addition to the recommendation approach we propose in find the items they are most interested in. A requirement this paper, we also present the results of a validation thereof. to provide personalized recommendations is that the RS has A user group of 44 people tested our RS, using item sets knowledge of the person using it. In 2013, Facebook claimed from two completely different domains: greeting cards and to have 1.11 billion active users [1], and the top-100 pages CBRecSys 2015, September 20, 2015, Vienna, Austria. 1 Synonyms like this one cause problems as well, and are Copyright remains with the authors and/or original copyright holders discussed in more detail in Section 3 holiday homes. Both the recommendation selection, as well [18, 19]. as the explanation interface were validated by these users, using their own social media profile. Our work is inspired by Shi et al.’s HeteRecom [20], which is based on the similarity calculation HeteSim [21]. Similar to This paper is further structured as follows: related work is their work, our ultimate goal is to find the matching paths discussed in Section 2, the motivation behind this research is between a user and the item set that carry the most weight. discussed in Section 3, the IBRS technology is presented in In this paper however, we focus on the detection of existing Section 4, while the validation approach and results are laid paths. out in Section 5, and Section 6 finally contains concluding remarks and hints at future work. 3. MOTIVATION In this work, we aim to extract recommendations that are 2. RELATED WORK generic in three dimensions: the recommendation approach The creation of a RS that makes use of social media or DB- shall be independent of the item set domain, the item set lan- pedia is not a new ambition. Social media have especially guage, and the used social medium. As a fourth criterium, received much attention in the field of content-based recom- it shall not suffer from any of Bobadilla’s three cold-start mender systems. Fijalkowski and Zatoka presented an archi- problem categories. Below, we discuss the motivation for all tecture of a recommender system for e-commerce based on of these challenges: Facebook profiles [4]. Guy et al. proposed five recommender types, based on social media and/or tags [5]. In their ap- Domain-independence proach, they also presented the users with recommendation As discussed in the previous section, currently most recom- explanation. The social media they focus on however, are mender systems based on knowledge bases and social media not of the mainstream type, but specific for the Lotus Con- are focused on one specific domain. Independence of the nections suite. The system of He et al., on the other hand, item set domain only allows us to reuse the solution and its uses common social media [6]. Whereas they claim to over- future improvements for multiple applications. come the cold-start problem, their system appears to still suffer from the new item cold-start problem, as described Language-independence by Bobadilla et al. [3]. Similar to domain-independence as a requirement for reusabil- ity, a language-independent solution improves the RS’s po- The creation of a RS based on DBpedia has also received tential to be used in multiple applications. A sub-requirement quite some attention already, especially in the field of mu- of of language-independence is synonym-independence. As sic [7, 8] and movie [9, 10, 11, 12, 13] recommendation. Di Zanardi and Capra pointed out in [22], synonyms are a typ- Noia et al. took it a step further and also benefited from the ical RS problem, especially for tag-based RSs. The example integration of DBpedia in the linked open data (LOD) initia- of people facebook-liking either the Soccer page or the Foot- tive. Their movie recommendations are not only based on ball page from Section 1 already showed that people may DBpedia knowledge, but also on Freebase and LinkedMDB. facebook-like different pages, while referring to the same A more generic approach to create a RS using LOD was done concept. Despite recent efforts by Facebook to merge pages by Heitmann and Hayes [14], who use also use LOD to over- about the same topic from different languages into one page, come the cold-start problem. Even though their validation and improving the search functionality to help people find- is based on a music dataset, their approach has the generic- ing such pages while searching for their name in a different ity to be used for other applications as well. Our approach language, still several pages exist to describe similar con- for broader concept detection through DBpedia is a form cepts. of knowledge-based query expansion. Liang et al. already showed in [15] that document recommendation based on the user’s interests improves as a result of query expansion, or Social medium-independence semantic-expansion as they call it. From the first form of genericity, domain-independence, fol- lows another requirement. Several social media, such as What distinguishes our approach from other RS research, Facebook, LinkedIn, Twitter, Instagram, and Pinterest, are is that we use both social media profiles and DBpedia data widely used, and each of these has its own focus. When one to create a generic RS. Passant and Raimond, for exam- decides to create a RS for job vacancies, LinkedIn may be a ple, created a RS based on exported social media profiles more logical social medium to base the recommendations on and DBpedia data in [8], but their approach is limited to than any of the other, while a RS for touristic hotspots will the music-specific relations in DBpedia. To the best of our most likely lead to another choice. Therefore, to create a RS knowledge, the only other generic approach is TasteWeights based on social media content that is domain-independent, by Bostandjiev et al. [16]. They build a user profile based on it shall also be independent of the underlying social medium. social media data, and then apply a collaborative filtering- based approach to select recommendations. This still implies all of the three cold-start problem categories: new item, new Cold-start problem user, and new community, again as described by Bobadilla et The cold-start problem has been widely discussed in RS al. [3]. As it is exactly our goal to overcome the cold-start literature. Bobadilla et al. categorized it into three sub- problem, our approach is a hybrid between content-based categories: the new item problem, the new user problem, and knowledge-based, according to the RS classification by and the new community problem [3]. Knowledge-based RS Burke and Ramezani [17]. Basile, Lops et al. would classify have been designed to overcome all of these problems, but our work as a top-down semantics-aware content-based RS often require domain-specific knowledge. Colosseum Colosseum Vespasian Rome Rome Pizza Pizza Calzone Italy Italy Francesco Francesco A.S. Stadio Stadio Totti Totti Roma Olimpico Olimpico Figure 1: The IBRS concept, illustrated using the holiday home domain. A user’s preferred items on social media are mapped onto knowledge base resources. Broader concepts are detected by exploring the knowledge base graph, and finally mapped onto tags in the item set database. Overcoming all of these four challenges at the same time its second neighbor C.2 In Table 1, we show the top-10 of has motivated us to create IBRS: a domain-independent, second neighbors when traversing the DBpedia graph start- language-independent, social medium-independent, knowl- ing from the Eiffel Tower as node A, using all four possible edge-based RS. direction combinations. DBpedia pages in italics also oc- cur as tags in at least one of our two validation sets, which 4. CONCEPT & TECHNOLOGY are discussed in detail in Section 5. The first approach, A → B → C, leads to results describing France, influen- The foundation of IBRS is the idea that people are more tial French people, and several other buildings in France. likely to be interested in items that have a not too distant The second approach, A ← B → C, has some overlap with relation with things we know they like. Although things the first approach, but also contains several results unre- people express a preference for on social media are typically lated to France, such as Los Angeles and the United States. in a different domain than our item set, they may still give The third approach, A ← B ← C, shows some remarkable hints towards a person’s interests. In IBRS, we link the buildings throughout Europe, but also very unrelated lists preferred items on social media to resources in the DBpedia towards the bottom of the top-10. The fourth and final ap- Resource Description Framework (RDF) graph. We use this proach, A → B ← C, results in several famous French peo- graph to explore related concepts, which are then matched ple, especially scientists. Other starting points show similar with a known tag set, that is used to label the item set. As results: the third approach, A ← B ← C, shows promising a final step, we rank the item set based on the number of results for single domain recommendations, whereas the first matched tags. This concept is illustrated, using the holi- approach shows the best results for broader concept detec- day home domain, in Figure 1. In this example, the user tion. Since our aim is to match these second neighbors with facebook-liked the Colosseum, pizza, and Francesco Totti. a tag set, we use the first approach, A → B → C. These facebook-likes are mapped onto DBpedia, and the DBpedia RDF graph is explored to detect the broader con- cepts Rome, Italy, and Stadio Olimpico. These items are 4.2 Abstraction layer data model mapped onto holiday home tags, to ultimately match the To ensure IBRS genericity, an abstraction layer is used on user with a specific holiday home. top of the underlying data source, such as a product database. This abstraction layer can consist of physical tables, views, The remainder of this section is structured as follows: RDF or a mix thereof, but we will refer to its items as tables from graph exploration is discussed in Section 4.1. The data here on. The abstraction layer contains two entity tables: model of the IBRS abstraction layer is presented in Sec- abstract items and tags, and one relationship table: ab- tion 4.2. Section 4.3 presents a method for automated tag stract items tags, as depicted in Figure 2. generation from descriptions. In Section 4.4 the ranking mechanism and Facebook-DBpedia mapping approach are presented. Section 4.5, finally, presents a short introduction of the IBRS prototype. 4.1 DBpedia graph exploration Figure 2: Abstraction layer data model After matching a facebook-like with a DBpedia resource, we traverse the RDF graph in exactly two steps. Since RDF The abstract items table contains the id and object type tuples have a subject, predicate and object, RDF graphs are 2 Depending on the directions of the relationships, and the directed. Therefore, there are four possible different direc- existence of bi-directional relationships, node A may be tion combinations to travel from node A through node B to equal to node C, as can also be seen in Table 1. Rank A → B → C (#) A ← B → C (#) A ← B ← C (#) A → B ← C (#) 1 Paris (20) Eiffel Tower (41) Eiffel Tower (7) Paul Langevin (51) 2 France (20) France (17) Palácio de Ferro (3) Léon Foucault (48) 3 Eiffel Tower (7) Paris (15) Cologne Cathedral (2) Jean Témerson (48) 4 Manuel Valls (6) Los Angeles (4) Eiffel Bridge, Ungheni (2) Frédéric Passy (45) 5 François Hollande (6) British Library (4) Souleuvre Viaduct (2) L.A. de Bougainville* (45) 6 Unitary state (6) Bonnétable (4) Samuel Hibben (2) Cecile de Brunhoff (45) 7 French language (6) Aarhus University (4) Casa de Fierro (2) Adrien-Marie Legendre (45) 8 Anne Hidalgo (6) Garabit viaduct (4) Modern Marvels episodes* (2) Robert Perrier (45) 9 Bonnétable (4) St Paul’s Cathedral (4) Monopoly editions USA* (2) Paul Lévy (math.)* (45) 10 Garabit viaduct (4) United States (4) Garabit viaduct (2) Émile Drain (45) Table 1: Top-10 of second neighbor nodes C through DBpedia graph exploration in multiple directions for the Eiffel Tower resource as node A. Numbers between brackets indicate number of paths between that node and the Eiffel Tower node. Items in italics also occur as tags in at least one of our two validation tag sets. Items marked with an asterisk are abbreviated. of the items in the item set. The object type field allows APIs are the same. Some social medium APIs allow devel- us to use one IBRS instance for the recommendation of mul- opers to find out what a user’s friends prefer, while others tiple item sets. limit the developer to information about the logged in user. Therefore, when using the Facebook Graph API, we lim- The tags table contains the tag’s id, name, and dbpe- ited ourselves to the name and category elements of each dia resource id. The name field can be used in the lan- facebook-liked page. guage of the item set tags. Since we have one item set that is tagged in Dutch, and one item set that is tagged in En- Matching social media items with DBpedia resources glish, we added the name eng field for English tags. The Facebook-likes are mapped onto DBpedia resources through dbpedia resource id is cached in the database for better their name. Those facebook-pages that mapped onto am- performance. biguous terms in DBpedia were filtered out. To create a more complete mapping, we used the category element to The abstract items tags table is a regular relation table postfix the name of those pages pages for which the cat- containing the abstract item id and tag id. It also con- egory element was filled with “movie,” “tv show,” or “mu- tains the abstract item type for improved join executions. sician/band.” In these cases, we also checked if a page ex- ists with the additional suffix “ (movie),” “ (TV series),” or 4.3 Tag generation “ (band)” respectively. This leads to the following SPARQL In case an item set is not tagged, but does contain descrip- query: tive texts, tags can be extracted automatically. Natural lan- guage processing algorithms can be used for this purpose, such as the named entity extraction and disambiguation ap- PREFIX dbpont: PREFIX dbpres: proach by Habib et al. [23]. We used Habib’s approach with # We use the prefixed versions here for readability a manually trained model to extract named entities from holiday home descriptions. A drawback of this approach SELECT ?uri ?label is that descriptions are often the result of free-text input. WHERE { Phrases such as “only a 3 hour flight from Amsterdam” or # Find exact match with category suffix “25 kilometers from the border with France” led to correctly { ?uri dbpont:wikiPageID []. FILTER(?uri = dbpres:The_Net_(movie)) } extracted named entities, but semantically not the best tags to distinguish this object from others. Therefore, we addi- # Or exact match without category suffix tionally removed those tags that tagged a holiday home with UNION { ?uri dbpont:wikiPageID []. another country than the one it is located in. In total, this FILTER(?uri = dbpres:The_Net) } approach allowed us to assign 455,777 (non-unique) tags to 42,148 holiday homes, from which 106,430 tags (of which # Or the label version UNION {?uri rdfs:label "The_Net"@en.} 12,151 unique) could be mapped onto a DBpedia resource. # Check if page has redirect 4.4 Ranking UNION { dbpres:The_Net_(movie) The IBRS ranking method consists of four steps: (1) retriev- dbpont:wikiPageRedirects ?uri} UNION { dbpres:The_Net ing preferred items from social media, (2) matching these dbpont:wikiPageRedirects ?uri} items with DBpedia resources, (3) extracting abstracts from DBpedia, (4) ranking items based on matched tags. For per- ?uri rdfs:label ?label. formance reasons, several items are cached offline. ?uri dbpont:wikiPageID ?wikiPageid. FILTER (langMatches(lang(?label),"en")). Obtaining preferred items from social media # Filter out ambiguous terms To map social media items while remaining independent of FILTER NOT EXISTS { ?uri the social medium, we must take into account that not all dbpont:wikiPageDisambiguates ?disambiguates } . still consider the concept itself domain-independent. This # Filter out Wikipedia categories in contrast to for example music recommenders that rely on MINUS {?uri rdf:type skos:Concept} the artist-song relationship. } LIMIT 1 4.5 Prototype For demonstration and validation purposes, we have created Using this approach on a test set of 11,674 unique Facebook a prototype of IBRS, using the Cake PHP platform. The pages, obtained from the likes of 309 users, we were able prototype can be used with either one’s own Facebook pro- to match 2,240 (19.2%) Facebook-pages with a DBpedia re- file, or by manually combining several DBpedia resources. source. It can be accessed through http://ibrs.ewi.utwente.nl. Extracting abstracts from DBpedia 5. VALIDATION For all matched DBpedia resources, the abstracts are re- To validate our ranking mechanism, as well as to deter- trieved from the SPARQL endpoint provided by DBpedia mine the user perception of recommendations with explana- [24] using the following query: tions, we validated IBRS in a carefully designed user study with a test user group of 44 people. We used two prod- uct sets from different domains to demonstrate its domain- PREFIX dbpont: independence: greeting cards and holiday homes. The greet- PREFIX dbpres: ing card set contains Dutch tags, while the holiday homes SELECT DISTINCT did not contain any tags, but only descriptions. From the ?o3 (count(?o3) as ?count) ?abstract ?label holiday homes, we used the English descriptions to extract (English) tags, to emphasize the potential to use IBRS in a WHERE { language-independent way. # UNION concatenation of mapped FB pages {dbpres:Vienna ?p1 ?o2} UNION This section is further structured as follows: Section 5.1 {dbpres:Recommender_system ?p1 ?o2} UNION {dbpres:Computer_science ?p1 ?o2} describes the item set details. In Section 5.2, we present the approach taken to validate both our ranking mechanism # Neighboring object has Wikipage and the recommendation explanation interface. Section 5.3 ?o2 dbpont:wikiPageID ?o2id ; finally, discusses the validation results. # Neighboring object has neighbor ?p2 ?o3 . 5.1 Item set details The first item set contains greeting cards from the Dutch # Second neighbor object has Wikipage company Kaartje2Go (“Card2Go”). People search through ?o3 dbpont:wikiPageID ?o3id ; dbpont:abstract ?abstract ; a collection of cards electronically, which are distributed rdfs:label ?label . through regular (non-electronic) mail by Kaartje2Go in name of the customer. The customers can choose between sending # English is used as an example greeting cards to one or multiple people at once. 75% of the FILTER(langMatches(lang(?abstract), ’en’)) . purchases are of the latter type, for which the preferences of FILTER(langMatches(lang(?label), ’en’)) . the sender are more relevant than those of the (potentially # Second neighbor object must not be a category many) recipients. To facilitate the search, users can search MINUS {?o3 rdf:type skos:Concept} for tags that have been entered manually by the Kaartje2Go } employees. These tags, which are mostly in Dutch, are in- consistent in their completeness: for example some of the # ‘Only’ the 1000 most important abstracts soccer cards are also tagged using the names of popular ORDER BY DESC(?count) Dutch soccer teams, but not all of them. Less popular teams LIMIT 1000 are never mentioned as tags. The top-10 of the translated greeting card tags can be found in Table 2. Ranking items based on matched tags Each tag that (1) has a dbpedia resource id and (2) is The second item set contains holiday homes from the hol- contained in at least one of the downloaded abstracts, is iday home portal EuroCottage. This item set did not con- marked as a matched tag. The item set is then ranked on the tain tags, but a description in one, two or three languages basis of the number of matched tags. As a final step, those (Dutch, English and/or German). We followed the approach items that are too close to a higher ranked item, based on a discussed in Section 4.3 to extract mentions of geographic pre-defined distance function, are removed from the ranking. places from the English holiday home descriptions. The This last step is added to ensure diversity among the recom- top-10 of resulting tags can be found in Table 3. The advan- mended items. For the recommendation of geographic ob- tage of extracting geographic places is that these also often jects, as for example in a geo-social RS like the one discussed have Wikipedia pages, which makes them suitable for the re- in [25], one can think of the Euclidean distance, but for quirement that the tags need to have a dbpedia resource id. more generic purposes the cosine similarity (as for example Many pages of the holiday home descriptions were in Ger- discussed in [22]) of the item’s tags may be a good starting man, even though they were entered into the system by the point. The tag input makes our RS domain-aware. However, holiday home owners as English descriptions. As a result since the approach can be applied to any tag domain, we thereof, many German words or phrases were extracted as Tag Frequency to validate our ranking mechanism, the third batch was in- Birthday 7,535 tended to determine the user perception of recommendations Party 4,200 with explanations, as compared to recommendations with- Love 2,521 out explanations. Girl 2,268 Boy 2,084 For the first ten questions, users were asked to select their Infant 2,056 favorite greeting card from a greeting card pair using the Photograph 1,793 interface of Figure 3. On one side of the screen, an item from Marriage 1,543 the top-10 greeting cards according to IBRS was shown. On Cool 1,381 the other side, a card was shown that was not tagged with Animals 1,373 any of the matched tags. We called these recommendations Inverted IBRS. IBRS and Inverted IBRS were shown on the Table 2: Top-10 of (translated) manual greeting card tags left or right side at random. with a DBpedia reference, ordered by the number of cards with this tag geographical references, since the model was trained for En- glish descriptions. However, the impact of these terms was practically zero, as these extracted tags were not matched with an English DBpedia resource.3 For the validation, the holiday homes were plotted on a map that was zoomed in on Europe, since most holiday homes in the set are located Figure 3: Validation interface for greeting card comparison there. A relatively small subset of homes outside Europe could therefore not be displayed on the map, and were re- For the second batch of ten questions, our test users were moved from the validation set, just as those without a coor- presented with the choice between two holiday homes, in a dinate pair. This coordinate pair was also used for the di- similar way. Again, IBRS and Inverted IBRS were shown versity function: all top-10 holiday homes had to be located on the left or right side at random. For each holiday home, at least 250 kilometers away from higher ranked items. its location was shown on a map, with the name of the holi- day home and the first 1000 characters of its description, as Tag Frequency shown in Figure 4. Florence 760 Siena 656 Mediterranean Sea 634 Tuscany 537 Legoland 513 Venice 508 Sotkamo 448 Europe 440 Ardennes 421 Pisa 363 Table 3: Top-10 of extracted tags for holiday homes with a DBpedia reference, ordered by the number of holiday homes with this tag 5.2 Validation approach Our test users were requested to participate through Face- book, and used their own existing Facebook account for the recommendations. The test users were not aware of what they were testing, except for the information that they were testing a RS. Most test users do not have a background in computer science, and none of them were aware of how IBRS works. We asked our test users to validate our algorithm through a total of 30 questions, split up into three batches Figure 4: Validation interface for holiday home comparison of 10. Once a question had been answered, users could not return to that question. The first two batches were intended The final batch of ten questions required the test users to 3 Even though the approach can be applied to any language rate a recommendation. Each of the holiday homes was one contained in the knowledge base, the tags are still matched of the top-10 holiday homes according to IBRS. At random, with knowledge base resources in the tag language. a user was assigned to the group of users who received rec- ommendations with an explanation, as shown in Figure 5, Inverted IBRS Inverted IBRS (31%) or without an explanation. IBRS (34%) (47%) IBRS (55%) Tie Tie (22%) (11%) (a) Split out between (b) Overall (batches com- greeting cards and holiday bined) homes (batches counted separately) Figure 6: Most frequent choices per user for the first two batches of questions the recommendations with an average score of 3.3772, while users without recommendation explanation rated the recom- mendations with a 3.4709 on average. From this validation, we can conclude that people that receive recommendations based on tags that do not describe them well, are more likely to reject a recommendation with a “strongly disagree,” when they see the rationale behind the recommendation. Despite satisfying results with respect to the system’s po- tential to rank recommendations for users, we should not Figure 5: Cut-out of validation interface for holiday home forget that many aspects play a role in the decision-making recommendation rating. The lines in orange/blue contain that cannot (yet) be detected from Facebook profiles. When the matched tags. choosing either a greeting card, a holiday home, or anything else, one will always look at domain-specific item charac- In test runs of the validation process, we determined that teristics. For a greeting card, the user looks at colors, style, in a set-wise comparison of the two systems, users tended and the occasion the card is sent for. Similarly, for a holiday to prefer the set that was spread out over the map, rather home, he looks at price, number of beds, the picture of the than one that contained clusters of recommendations. Since home, and the distance to the beach. For this reason, this Inverted IBRS is extremely spread out, due to the fact that approach shall only be used as a feature of a larger system. items had no relation with the users or each other, this caused a bias in the validation results. Therefore, we de- Relative frequency Relative frequency cided to only compare the results item-wise. Furthermore, 0.4 0.4 we removed tags with a negative connotation, such as “die,” 0.3 0.3 or “death.” 0.2 0.2 0.1 0.1 5.3 Validation results 0 0 The first two batches of the validation were used to deter- 1 2 3 4 5 1 2 3 4 5 Rating Rating mine the potential of the IBRS ranking mechanism. The (a) With recommendation (b) Without recommenda- results are shown in the pie charts of Figure 6. Figure 6a explanation; average rat- tion explanation; average shows which system was the test user’s preferred system, ing: 3.3772. rating: 3.4709. based on a majority vote between the two systems. Most Figure 7: Recommendation ratings split out by recommen- users participated in the validation of both the recommen- dation presentation interface dation of greeting cards and holiday homes. Each batch was counted separately. 47% of the users preferred IBRS, 22% voted equally often for both of the systems, and 31% of the users preferred Inverted IBRS. In the pie chart of Figure 6b, 6. CONCLUSION the results are shown when the results of holiday homes In this paper, we presented the approach behind IBRS. We with the greeting cards are combined per user. Since this discussed the concept of mapping items marked as preferred increases the number of votes per user, ties are less common. or liked in social media onto a generic knowledge-base, and In this scenario, 55% of the users preferred the IBRS results, query expansion using DBpedia. We presented the tech- while 34% preferred Inverted IBRS. nology, including the abstraction layer, tag generation ap- proach, and ranking mechanism. We also presented the val- The final batch of the validation was used to determine the idation results of a test user group. As said, we recommend usefulness of the proposed recommendation explanation in- to use the proposed and validated approach from this pa- terface for holiday homes. The results of this batch are per as a feature of a larger recommender system. In a more shown in the histograms of Figure 7. Contrary to our expec- complete system, one also needs to take domain-specific fea- tations, users preferred to receive recommendations without tures, as well as item popularity and other collaborative fil- explanations. Using the 5-point Likert scale, the users who tering features, into account. However, these features would were presented with an interface with explanations rated contradict with our objective to create a generic RS that overcomes the cold-start problem, and therefore were not Proceedings of the 8th international conference on taken into account in this work. Intelligent user interfaces, pp. 263–266, ACM, 2003. [12] V. C. Ostuni, T. Di Noia, R. Mirizzi, D. Romito, and Currently, IBRS uses all paths in the knowledge base graph E. Di Sciascio, “Cinemappy: a context-aware mobile as an indication for a useful recommendation. However, app for movie recommendations boosted by DBpedia,” some paths in the graph actually form a reason not to rec- SeRSy, vol. 919, pp. 37–48, 2012. ommend that item. For example, in the holiday home do- [13] P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos, main, a user is less likely to book a home in his own town, “Moviexplain: a recommender system with even though there may be many paths between him and explanations,” in Proceedings of the third ACM that holiday home based on his local likes. Furthermore, conference on Recommender systems, pp. 317–320, some nodes are more useful than other for recommendation. ACM, 2009. DBpedia nodes like “European Central Time” have a lot of [14] B. Heitmann and C. Hayes, “Using linked data to incoming paths, while it is unlikely that this actually forms build open, collaborative recommender systems.,” in an interest for this user. The next step for IBRS is to fur- AAAI spring symposium: linked data meets artificial ther improve the ranking mechanism by incorporating these intelligence, pp. 76–81, 2010. characteristics and explore the possibility to automatically [15] T.-P. Liang, Y.-F. Yang, D.-N. Chen, and Y.-C. Ku, detect (negative) weights of paths. “A semantic-expansion approach to personalized knowledge recommendation,” Decision Support 7. ACKNOWLEDGEMENTS Systems, vol. 45, no. 3, pp. 401–412, 2008. This publication was supported by the Dutch national pro- [16] S. Bostandjiev, J. O’Donovan, and T. Höllerer, gram COMMIT/. We also thank Mena Habib for his sup- “TasteWeights: a visual interactive hybrid port in the tag generation process. recommender system,” in Proc. of the 6th ACM conf. on Recommender systems, pp. 35–42, ACM, 2012. 8. REFERENCES [17] R. Burke, “Hybrid web recommender systems,” in The [1] Facebook, “Facebook | photos.” adaptive web, pp. 377–408, Springer, 2007. https://www.facebook.com/facebook, 2013. [18] P. Lops, “Semantics-aware content-based recommender [2] S. Bakers, “Statistics of the top facebook pages.” systems,” 10 2014. Keynote at Workshop on New http://www.socialbakers.com/statistics/ Trends in Content-based Recommender Systems. facebook/pages/total/, 2013. [19] P. Basile, C. Musto, M. de Gemmis, P. Lops, [3] J. Bobadilla, F. Ortega, A. Hernando, and F. Narducci, and G. Semeraro, “Content-based A. Gutiérrez, “Recommender systems survey,” recommender systems + DBpedia knowledge = Knowledge-Based Systems, vol. 46, pp. 109–132, 2013. semantics-aware recommender systems,” in Semantic [4] D. Fijalkowski and R. Zatoka, “An architecture of a Web Evaluation Challenge, pp. 163–169, Springer, web recommender system using social network user 2014. profiles for e-commerce,” in Computer Science and [20] C. Shi, C. Zhou, X. Kong, P. S. Yu, G. Liu, and Information Systems (FedCSIS), 2011 Federated B. Wang, “HeteRecom: A semantic-based Conference on, pp. 287–290, IEEE, 2011. recommendation system in heterogeneous networks,” [5] I. Guy, N. Zwerdling, I. Ronen, D. Carmel, and in Proceedings of the 18th ACM SIGKDD E. Uziel, “Social media recommendation based on international conference on Knowledge discovery and people and tags,” in Proc. of the 33rd intern. ACM data mining, pp. 1552–1555, ACM, 2012. SIGIR conference on Research and development in [21] C. Shi, X. Kong, Y. Huang, S. Y. Philip, and B. Wu, information retrieval, pp. 194–201, ACM, 2010. “HeteSim: A general framework for relevance measure [6] J. He and W. W. Chu, A social network-based in heterogeneous networks,” IEEE Transactions on recommender system (SNRS). Springer, 2010. Knowledge & Data Engineering, no. 10, [7] A. Passant, “dbrec - music recommendations using pp. 2479–2492, 2014. DBpedia,” in The Semantic Web–ISWC 2010, [22] V. Zanardi and L. Capra, “Social ranking: uncovering pp. 209–224, Springer, 2010. relevant content using tag-based recommender [8] A. Passant and Y. Raimond, “Combining social music systems,” in Proceedings of the 2008 ACM conference and semantic web for music-related recommender on Recommender systems, pp. 51–58, ACM, 2008. systems,” in The 7th International Semantic Web [23] M. B. Habib and M. van Keulen, “Improving toponym Conference, p. 19, Citeseer, 2008. disambiguation by iteratively enhancing certainty of [9] R. Mirizzi, T. Di Noia, A. Ragone, V. C. Ostuni, and extraction,” in Proceedings of the 4th International E. Di Sciascio, “Movie recommendation with Conference on Knowledge Discovery and Information DBpedia,” in IIR, pp. 101–112, Citeseer, 2012. Retrieval, KDIR 2012, Barcelona, Spain, (Spain), [10] J. Golbeck and J. Hendler, “Filmtrust: Movie pp. 399–410, SciTePress, October 2012. recommendations using trust in web-based social [24] DBpedia, “SPARQL explorer for networks,” in Proceedings of the IEEE Consumer http://dbpedia.org/sparql.” communications and networking conference, vol. 96, http://dbpedia.org/snorql/, 2015. pp. 282–286, University of Maryland, 2006. [25] V. de Graaff, M. van Keulen, and R. A. de By, [11] B. N. Miller, I. Albert, S. K. Lam, J. A. Konstan, and “Towards geosocial recommender systems,” in 4th J. Riedl, “MovieLens unplugged: experiences with an Intern. Workshop on Web Intelligence & Communities occasionally connected recommender system,” in (WI&C 2012), Lyon, France, ACM, 2012.