-

Content- and Graph-based Tag Recommendation: Two Variations

Johannes Mrosek

mrosek@internet-sicherheit.de 0

Stefan Bussmann

Hendrik Albers

Kai Posdziech

kaip@gmx.net 0

Benedikt Hengefeld

hengefeld@web.de 0

Nils Opperman

nilsoppermann@web.de 0

Stefan Robert

Gerrit Spira

gerrit.spira@gmx.de 0 0 FH Gelsenkirchen, Department of Computer Sciences , Neidenburger Strasse 43, 45877 Gelsenkirchen , Germany

We describe two variants of our approach to tackle the task 1 & 2 of the ECML PKDD Discovery Challenge 2009 where each contenter had to identify up to 5 tags for each resource of a given set of either bibtex-like references to publications or bookmarks. The quality of the results was measured against the tags that users of the data source (www.bibsonomy.org) had originally assigned to the resources (F1 measure). In our approach, we either generate tags (from the content of the given resource data or after crawling additional resources) or we request tags from tagging services. We call each of this tag sources a tag recommender. We then combine the results of the tag recommenders based on weighting factors. The weighting factors are determined experimentally by comparing generated and expected tags based on the available training data. This general idea is also used for the graph-based approach required to solve task 2. Here again, the nal tag recommendations are computed from the individual results of the di erent tag-recommending algorithms. In the preliminary result list, we ranked second for task 1 (Group 2) and nineth for task 2 (Group 1).

content-based graph-based tag recommendation bibsonomy

Assigning tags to resources can be an e ective instrument to organize an information space. Users interacting with this information space may utilize the tags to identify relevant resources or groups of resources. User-driven tag assignment is a popular way to ensure at least some sort of tag quality1. Often, the users assigning tags are supported by algorithmic tag recommendation. This may be as simple as o ering auto-complete elds for entering tags that display tags already assigned by users, or it may be based on a full- edged analysis of the resource 1 in the sense that the tags indeed help users to nd or organize resources, for recent results on tag quality compare [ 2, 3 ]. the user wants to tag. This analysis may present a set of automatically identi ed tags to the user. Within the later context, compare [ 1 ], we describe two variants of our approach to content-based tag recommendation which we applied to task 1 and 2 of the ECML PKDD Discovery Challenge 2009. We acted as two teams (with completely independent implementations), each team deploying its own variation of the overall approach { and each with di erent success. In the following we rst describe the solution and results of group 12 for tasks 1. This will be followed by a brief presentation of the di erences of the solution for task 1 of group 23 and their results. We conclude with details of the solution for task 2 of group 1. 2

Content-based Tag Recommendation

This attempt on content-based tag recommendation uses di erent sources to generate tag candidates. These candidates are combined on the basis of appropriate weighting factors which have been assigned to the particular sources ex ante. The rst kind of source are web services, which o er information for known resources. This information contains tags which already have been assigned to those resources. In this case we query del.icio.us and citeulike.org. The second kind of source are the resources themselves. In case of bookmarks the content of the according websites is crawled and analyzed. For bibtex entries the information contained in the bibtex table is taken into account. So tag candidates are determined without using any external service. The last source is also a web service called tagthe.net. It already has an engine, which recommends tags for a given URL. 2.1

Harvesting Tag Candidates

The rst step to a tag recommendation is the accumulation of candidates and additional information from the several sources. For every URL in the bookmark table, the del.icio.us service is queried. It can be accessed via a feed. The URL md5 hash of the resource is inserted into a URL-pattern. When the resulting URL is accessed, all available information is returned. It includes a count which speci es how often the URL was posted, and a list of top tags assigned to it. Each tag is supplemented with an information about the frequency with witch it has been assigned to the given resource.

In some cases it is possible to nd tag candidates for bibtex entries at citeulike.org. The description or misc eld of the bibtex table often contain a citeulikeid. With this id the according citeulike page can be called and crawled for the assigned tags. These tags are already classi ed as "tag", "publisher" and "author". Numerical information like a count cannot be obtained.

A similar kind of data is provided by the service tagthe.net. It generates classied tags for a submitted URL automatically and independent of the document format behind it. The available categories are "tag", "author", "person" and "location". Like citeulike.org, tagthe.net returns no numerical information. Anyway this service can be seen as a backup system, because it can provide tag candidates for every URL. Thus it is called for all URLs in the bookmark and the bibtex table. Unfortunately there is no information about the algorithmics of the service available.

In addition to the citeulike-ids the bibtex table contains further interesting information. The title, journal and description elds contain words that can be potentially used as concise tags. Hence after comparison with a stop word list all remaining words in these elds are interpreted as tag candidates. The last source for tag candidates are the websites behind the URLs in the bookmark table. After crawling and parsing the site's source code the words are counted and checked against a stop word list. Furthermore they are classi ed by the location of their appearance like in "meta keywords", "meta description", "title" and "body". 2.2

Selection Tags from the Candidates

The recommendation system has a hierarchical structure. This means there is one meta recommender which relies on several separate source recommenders, one for each source described in section 1.1. These recommenders provide up to 20 tag candidates for a queried content id. Every tag candidate is complemented by a score the according recommender assigns to it. This score can vary between 0 and 1. How the recommender constitutes that score depends on the source and is described in section 1.3. After the source recommenders have made their suggestions, the meta recommender takes all of the intermediate results and determines the actual tags. Therefor it combines a candidate's frequency of appearance in all sources k with the score si (1 i k) provided by the source recommenders. si is already in uenced by a weighting factor for the individual sources. How this factor is determined will be described in section 1.4. The nal score s for a tag candidate can be determined by the formula s = k (s1 + s2 + ::: + sk): (1)

Note that s can be a value greater than 1. When the meta recommender has calculated these aggregated scores for all candidates, they are ordered by this new information. The ve tags with the highest scores are selected as recommended tags. In order to prohibit recommendation of tags with a very low score, an optional lter can be set. This helps to enhance the precision. 2.3

Calculating the Individual Scores

As already stated the calculation of the single scores di ers depending on the source. The reason for this is the additional information, which is provided besides the tags. Every source o ers a di erent kind of information. To calculate a del.icio.us score sdelicious for a tag, the number of posts n which assigned it to the resource is devided by the total number p of posts for the resource at del.icio.us. After that the result is multiplied by the weight wdelicious constituted for del.icio.us.

sdelicious = wdelicious n p

The scores for tagthe.net and citeulike candidates are determined as follows: For each of the provided categories a weighting factor ccategorie is de ned: ctag = 1:0, cauthor = 0:3, cpublisher=0:2, clocation = 0:2, cperson = 0:1. The score stagthe=cite of each tag candidate is the result of the product of the source's weight wtagthe=cite and the categorie's weight ccategorie. In one case there is an exception from this practice. As can be detected in the training data the tag "juergen" appears pretty often. So this special tag is always handled with a score of 1:0. Otherwise all scores are determined by stagthe=cite = wtagthe=cite ccategorie: (2) (3)

The tag candidates generated from the bibtex entries are treated in a similar way. For some elds a weighting factor cfield is set: ctitle = 0:55, cjournal = 0:25, cdescription = 0:2. Only tags from the title eld are used as a suggestion for the meta recommender. These candidates get a higher score if they also appear in the journal or description eld. The individual scores for these elds are added to the initial title-score of 0:55.

The scoring process for the crawled content of a website is a little more complex. It is based on the information about the location of appearance as described in section 1.1. Three steps are passed until the nal score is determined. In the rst step a tag's frequency of appearance nlocation in the individual locations is counted. This value is weighted by a factor clocation which is related to the location's importance. E.g. a word from the title or the keyword-list is rather a good tag than one from the body. All counts are multiplied by their weighting factors get summed up to a kind of weighted frequency nwfreq for the whole resource nwfreq = ntitle ctitle + ndescription cdescription + nkeywords ckeywords + nbody cbody: (4)

If a tag appears at di erent locations, its score is raised in step two. This takes into account the fact that such a tag is a good tag in most cases. To raise the score, nwfreq is multiplied by a factor fcat which is related to the number of categories the tag appeared in.

The modi ed weighted frequency nwfreq is determined by fcat = 8 1; 1 categorie > >< 1:5; 2 categories

3; 3 categories > >: 5; 4 categories nwfreq = nwfreq fcat: (5) (6)

In the last step, nwfreq is normalized to the common score interval [0; 1]. Therefor every nwfreq is divided by the highest nwfreq which is reached for the crawled resource. The result is a score scc for a tag from the crawled content. The number of tags which are returned to the meta recommender is limited. Only the 20 tags with the highest scores are chosen. 2.4

Calculating the Source Weights

A key point in this two-stage recommendation approach is the calculation of suitable weighting factors for the di erent sources. Their tag-quality varies in a wide range. Thus handling the candidates of the individual recommenders as if they were of similar quality leads to bad results. As a foundation for the weighting factor calculation the postcore-2 of the training dataset was used. For every source the set of tag assignments it can supply was determined and an according tas le was generated. E.g. to calculate a factor for citeulike, a tas le with all content ids, which are associated with bibtex entries containing a citeulike-id, was generated. Corresponding to the tas les a result le was created for every source. This result was independent of the meta recommender and all the other sources. The result les were measured with f1-score against the tas les created before. The rst attempt was to use the f1 score a result le achieved as the weighting factor for the according source recommender.

However when testing the complete recommendation process against postcore-2, experiments with various weighting factors pointed out a better choice. Using the precision value which has been computed for the various result les to get the f1-score, provides better overall results. Thus all the weighting factors for the particular sources assumed in the meta recommender match the precision the respective source recommender reaches for the subset of tag assignments it can supply. 2.5

Results

Table 1 displays the results each recommender achieved for his own subset of the postcore-2 training data. The intermediate results in the rst ve rows show Test del.icio.us tagthe.net bibtex web content meta meta with lter Meta/Overall with lter

5285 0.446 / 6372 38383 0.418 / 40882

40468 0.066 / 51580

22341 0.155 / 22341

33193 0.150 / 40882

63107 0.350 / 64120

62104 0.344 / 64120

30844 0.132 / 43002 0.134 0.353 0.053 0.127 0.123 0.254 0.269 0.103 0.206 0.383 0.059 0.139 0.135 0.294 0.302 0.116 recall, precision and f1-score for every source recommender. The weighting factor later used for the meta recommender is the particular precision as described in section 1.4.

Row ve and six present the results of the meta recommender for the training data. The rst is generated without a lter whereas the second one uses a lter to avoid tag recommendations with very low scores as described in section 1.2.

The table shows that del.icio.us and citeulike produce good tag candidates. Recommendations made by other sources have a much lower quality. The meta recommenders result for the test dataset is far below the result of the training data set. 2.6

Conclusion

The comparison of the meta recommender's results for training and test data leads to the conclusion that the choice of sources was suboptimal. As the training results were computed on the postcore-2 the web services provided tags with a high quality. Every resource was posted in bibsonomy at least two times. So the probability that it is contained in the web services' databases as well, was pretty good. Thus the decrease in score might be explainable by the unpopularity of the resources in the test dataset.

In conclusion the two stage approach is a good attempt for popular resources. To raise the result quality for unpopular resources too, further sources with previously assigned tags have to be added. 2.7

Group 2: Variations for Task 1

Harvesting Tags: In addition to citeulike and del.icio.us, group 2 implemented a tag recommender using Google Scholar for bibtex entries. Using the title data only, links to the resource itself or similar resources were harvested. From the rst ten entries three were selected by counting identic words in the titles of the referenced documents (ignoring certain stop words). In the competition, the Google scholar tag recommender harvested roughly 60.000 links for 24.000 bibtex data entries. Subsequently, the Web content crawler of group 2 was used to obtain candidate tags from the three selected resources.

Furthermore, the web content crawler was able (to a certain extend) to parse PDF documents in addition to HTML documents. Also the implementation of the citeulike tag recommender di ered: a recent database dump of the citeulike data was used which made it possible to search for DOI-ids or URLs directly and to determine the overall frequency of tags in the citeulike data set. TagTheNet was not used. A straighforward tag recommender (called Data set recommender below) that analysed the relevant elds of the data entries complemented the set of recommenders.

Selecting Tags: No additional ltering for small scores was used. If less than 5 tags were harvested, the remaining slots for tags were lled by randomly drawing tags from the 30 tags most often used in bibsonomy4. The weights for a recommender could be di erent for bibtex and bookmark context. The weights for the 4 Not a very successful idea { rst evaluations show that this was rather counterproductive. recommenders were chosen manually, based on experiences while experimenting with the training data. Tags recommended by more than one recommender were rated signi cantly higher by multiplying their score with a factor which depends exponentially on the number of recommendations.5 Results: The results of group 2 for task 1 are presented in table 2. Additional quantitative data are shown in table 3.

Please note that due to a persistent problem with the CiteULike tag recommender, it is not listed in the training data and haven't been used in the actual competition. Therefore, the relevant F1-score for the ranking in the competition is 0.180 (second place). Some more details on the impact of the di erent tag recommenders will be presented at the workshop. The graph-based approach adapts the same hierarchical recommender structure the content-based approach is based on6. This means that there is also one meta recommender which combines the results of subordinated recommenders. Instead of using external sources, these recommenders implement di erent algorithms. Every algorithm processes bibsonomy's graph structure and the contained user information in a di erent way to generate a set of tag recommendations.

Like for the content-based approach every tag is valuated with a suitable weighting factor. This weighting factor also depends on the quality of the subordinated recommender. It is determined in the same way as for the source recommenders in Task 1 (cmp. section 1.4). The used benchmark data set contains the whole postcore-2. For processing the nal test data set, three di erent algorithms were used to supply the meta recommender. All of these algorithms have a di erent focus on evaluating the graph-structure. The speci c operation methods are presented in sections 2.1 to 2.3. 5 value = value * Math.pow((1.0 + 3*(count - 1)/10), count); 6 The following section documents the work of group 1 only The rst algorithm selects tags based on the resource information. Therefor all tags which have already been assigned to a queried resource are determined. Additionally each tag's frequency of appearance nlocal in combination with the resource is counted. To calculate the score str, this value is taken and divided by the number npost the resource was posted. The result is multiplied by the weighting factor wtr for this recommender. Like in section 1, the recommenders precision for postcore-2 is used for this purpose.

str = wtr nlocal npost (7) (8)

If two tags reach the same score, the one with the higher frequency of appearance in the entire postcore is preferred. 3.2

Tag by User

The second algorithm recommends and scores tags with a special relation to the user. Algorithmically this approach is very similar to the one described in section 2.1. At rst the set of tags is identi ed which the user who makes a post has assigned to other posts previously. The frequency of usage for the tags nlocal is also determined. From this set of tags the nlocal-value for the most used one is taken as a reference value nmax. The score stu this recommender calculates for a tag is as follows stu = wtu nlocal nmax

The weighting factor wtu is determined as usual (cmp. Section 2.1). 3.3

Tag by User Similarity

The last algorithm determines tags, which have been used by similar users. The similiarity between two users is de ned over the number of equal resources posted by them.

Therefor the rst task is the determination of similarities between all users. These are calculated by the Tanimoto-Score under consideration of the posted resources. These resources are used as comparison criterion for di erent users. In the recommending process the top ve of the similar users are identi ed. The tags they assigned to the posted resource are accumulated and sorted by their frequency of appearance ncount in the posts of the top ve users group. As a reference value the maximum frequency of appearance nmax of a tag in the same group is utilized. The nal score sus for a tag generated by this recommender is calculated with the following formula: sus = wus ncount nmax (9) The results for Task two are arranged the same way as the results for the contentbased approach in task one. The recall, precision and f1-score values for the training datatset are shown individually for every algorithm. Furthermore results are visualized for meta recommenders which use di erent combinations of the subordinated recommenders. A lter like the one described in 1.2 is tested too.

The recommendations for the test dataset have been made by a meta recommender which uses all of the three subordinated recommenders and a lter. They are a good deal worse than the results for the training dataset. Like for task 1 there is a dramatic decrease of the f1-score in the test. In this case the explanation can be found in the composition of the training data les. When evaluating the methods for task 2 with the training data, the post, a recommendation is made for, is not excluded from the dataset. Thus the tags of the post are de nitely in the training dataset. This increases the probability that the expected tags are recommended. As a consequence, the scores are higher as for the test data which are not included in post-core 2.

In conclusion, the evaluation of the training data is not valid, but the algorithm works correct for the test data.

Final remarks

The competition o ered an excellent test bed for our approach to tag recommendation. Though we are pretty happy with the results obtained, much work remains to be done: { setting the weights for the di erent recommenders could be improved (iterative algorithmic optimisation; cleaning/choosing training data), { relations to similar resources in the bibsonomy data set (identi ed for example with the help of the recommended tags or based on Web crawling / Google scholar results) could be explored: if, for a given resource x, a similar resource y in the bibsonomy data sets is identi ed, tag candidates can be drawn from resource y (either directly or via further graph-based or recursive similarity-based analysis), { ne-tuning of the individual tag recommenders, for example by coupling the

Web crawler to the tag database, to name just a few open topics. We want to thank the organizers7 and to express our hope that this stimulating and exciting competition will be continued in the future and that some of our results and suggestions may help others to develop further improved tag recommenders. 7 and the anonymous reviewers and Wolfram Conen for their valuable comments.

[1] Jaschke , R., Marinho , L.B. , Hotho , A. , Schmidt-Thieme , L. , Stumme , G.: Tag Recommendations in Folksonomies . In: PKDD 2007 , Springer ( 2007 )

[2] Sen , S. , Vig , J. , Rield , J.: Learning to Recognize Valuable Tags . In: IUT'09 , ACM ( 2009 )

[3] Zhang , S. , Farooq , U. , Carroll , J.M. : Enhancing Information Scent: Identifying and Recommending Quality Tags . In: GROUP'09, ACM ( 2009 )