<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Combining Tag Recommendations Based on User History</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Ilari</forename><forename type="middle">T</forename><surname>Nieminen</surname></persName>
							<email>ilari.nieminen@tkk.fi</email>
							<affiliation key="aff0">
								<orgName type="institution">Helsinki University of Technology</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Combining Tag Recommendations Based on User History</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">DAD0B3AC141710FCF89E4EFFFFF181CC</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T06:53+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper describes our attempt at Task 2 of ECML PKDD Discovery Challenge 2009. The task was to predict which tags a given user would use on a given resource using methods that only utilize the graph structure of the training dataset, which was a snapshot of Bib-Sonomy. The approach combines simple recommendation methods by weighting recommendations based on the tagging history of the user.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Collaborative tagging systems or folksonomies have steadily gained popularity in the recent years. Users are free to choose the tags they want to use, and while this may be a main reason behind the popularity of these systems, it is also one of the biggest problems these systems face. As users come up with new tags they forget the tags they used to use, making it difficult to find the previously tagged content. Tag recommendation can help both in search and in keeping the users' tagging practices consistent. Tag recommendation can be defined as the problem of finding suitable tags or labels to a given resource for a given user.</p><p>Tag recommendation can be an important element in a folksonomy as it can help users employ the tags consistently as well as help users to use same tags for similar resources. This can improve searching within the users' own resources as well as the folksonomy.</p><p>We present a method for tag recommendation that combines several baseline methods and collaborative filtering. Combining the results makes use of the past performance of the recommenders.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Tag Recommendation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Collaborative Filtering for Folksonomies</head><p>Collaborative filtering (CF), a popular method used in recommender systems can be adapted for tag recommendation. The description here is based on <ref type="bibr" target="#b0">[1]</ref>.</p><p>Folksonomy can be understood as a tuple F = (U, R, T, Y ), where U is the set of users, T is the set of tags and R is the set of resources (bookmarks and BibTeX entries in the case of BibSonomy <ref type="bibr" target="#b1">[2]</ref>) and</p><formula xml:id="formula_0">Y ⊆ U × R × T is the tag assignment relation. Projections π U R Y ∈ 0, 1 |U |×|R| , (π U R Y ) u,r := 1 iff ∃t ∈ T s.t. (u, r, t) ∈ Y and π U T Y ∈ 0, 1 |U |×|T | , (π U T Y ) u,t := 1 iff ∃r ∈ R s.t. (u, r, t) ∈ Y let</formula><p>us define the "tag neighbourhood" and "resource neighbourhood" of the users. The set of k nearest neighbours for a user u using the neighbourhood matrix X is</p><formula xml:id="formula_1">N k u := argmax k u∈U sim(x u , x v ) (<label>1</label></formula><formula xml:id="formula_2">)</formula><p>where sim is the cosine similarity sim(x, y)</p><formula xml:id="formula_3">:= x • y ||x| | ||y| |<label>(2)</label></formula><p>The set of recommendations for a given user-resource pair (u, r) is</p><formula xml:id="formula_4">T (u, r) := argmax n t∈T v∈N k u sim(x u , x v )δ(v, r, t)<label>(3)</label></formula><p>where δ(v, r, t) := iff(v, r, t) ∈ Y .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Baseline Methods</head><p>The following are a collection of simple recommendation methods, which do not produce very good recommendations and have few redeeming qualities except that they are computationally inexpensive.</p><p>Popular tags for a resource. If the users of the folksonomy are homogenous, this method can be expected to perform almost as well as CF methods. However, if the users have very different tagging habits or if people use different tags from different languages, performance for the minorities can be expected to suffer.</p><p>Popular tags for a user. Some users use relatively few but obscure tags, which means that the popular tags for resource -recommender will not work. Collaborative recommendations also will not work well, as the user will probably have very few applicable "tag neighbours" and the "resource neighbours" will most likely not use the same tags. For example, user 483 used the tag "allgemein" a total of 2237 times in the 9003 posts. In other words, given a post by this user at random, there is almost a 25% chance it is tagged "allgemein".</p><p>Globally popular tags. Recommending the most used tags is perhaps the simplest possible method.</p><p>We used several variants of the aforementioned recommenders. These and the method used to combine the recommendations are described in chapter 4.1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Data Description and Preprocessing</head><p>The provided training data contains three files: bibtex, bookmark and tas. The bibtex and bookmark files describe the content of the links and BibTeX entries, respectively. The tas file contains the tag assignments. Also provided was the post-core at level 2 <ref type="bibr" target="#b2">[3]</ref>, which contained a reduced set, which contained only those users, resources and tags that appear at least in two posts. The test set for this task was known to have the users, resources and tags from this set.</p><p>We processed bookmarks and BibTeX entries identically. The only information extracted from the "bookmark" and "bibtex" tables were the hash values which identified the resources. We used the url hash and simhash1 columns and did not attempt to combine duplicate resources. The url hash considers two resources different if there are any differences in the url, such as a trailing slash.</p><p>To retain a slightly better neighbourhoods for the collaborative filtering approach we used full training set to calculate the neighbourhoods, but removed the tags that could not appear in the results. The difference between this and the post-core at level 2 was that this left several partial posts to the training data.</p><p>No effort was made to separate functional tags (such as "myown" and "toread") from descriptive tags, which are considerably more interesting in tag recommendation.</p><p>Some of the most used tags in BibSonomy are used by a small minority, such as "juergen" (3101 posts, 2 users). In total, in the subset of tags that are contained in the post-core 2 there are 273 tags that have been used at least 100 times by at most 5 people. A measure for the popularity of the tag, which takes into account the number of users of a tag can be defined as</p><formula xml:id="formula_5">popularity(t) = log(N t ) * log(N * t ),<label>(4)</label></formula><p>where N t is the number of times the tag t has been used and N * t is number of users for the tag t.</p><p>This measure can be used to improve tag recommendation methods which would not otherwise give weights to different tags. As can be seen from Table <ref type="table" target="#tab_0">1</ref>, sorting the tags by their "popularity" removes the unlikely tag "zzztosort" while preserving a sensible selection of popular tags.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Combining Recommendations</head><p>The baseline methods can yield good results on certain users, but they are generally worse than the alternatives. However, combining the baseline results with results from collaborative filtering or other methods can be used to improve the general results. The problem of combining results is in evaluating the trustworthiness of the recommender results.</p><p>In tag recommendation, there are multiple "items" that are recommended, and besides the similarity between the user and the neighbours of the user there are few evident factors that could be used to weight the tags when combining different methods. In our method, we used the training data to predict the recent posts of the users (1-100 posts, but at most 20% of the user's all posts)</p><p>In our approach, we took the arbitrary set of methods shown in Table <ref type="table" target="#tab_1">2</ref> and assigned weights to different tags by calculating the weighted sum over all recommenders using the per-user per-post weighted sum</p><formula xml:id="formula_6">w t := p [t ∈ T ] * 0.9 k f p (5)</formula><p>where f p is the F-measure of the method p ∈ 1, .., 7 on the validation set, and k is the position of the tag in the recommendation. This reduces the weight of the tag slightly so that the methods with smaller F-measure have a better possibility of getting a likely tag in the final results. The final recommendation are the five t ∈ T with the highest w t . Prior to the competition, we performed a test with the training data. The posts were divided into three sets based on the post date. The first 80% was selected to work as a training set, the following 10% as the validation set and the last 10% were used for testing. The method weights were computed from the validation set. The resulting weights were tested on the test set, showing a modest 5% improvement in the F-measure over the best baseline method in the test.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Experiment on the Competition Set</head><p>The weights for the methods were assigned to the users in the competition set by generating recommendations for recent posts with all the methods listed in the previous section. The amount of posts was chosen was up to 100 posts, but at most 20% of the user's all posts. After this, the F-measure for each method was used to generate a mixing profile for each user. Then the recommendations were made for the competition set and these were combined using the equation 5. The results are summarized in Table <ref type="table" target="#tab_2">3</ref>. One of the baselines (resource tags) outperforms the combined result slightly on the competition set. Some of the recommendations, such as "resource tags", can contain very unlikely tags when the resource itself is tagged only a few times and contains unpopular tags; this was not taken into account when combining the recommendations. A possible solution for this problem is to not recommend unlikely (unpopular) tags if the user hasn't used them in the past.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion</head><p>In these experiments, the weights of the recommenders are based on their past performance, but it is likely that there are several features that can be used to estimate these weights from statistical features of the user, such as the average "popularity" of the user's tags and the number of distinct tags. We would like to study these numbers for correlations. Recommendations by other methods, such as FolkRank <ref type="bibr" target="#b0">[1]</ref> could be added to improve the performance on the dense parts of the data.</p><p>The obtained results were less than stellar; in retrospect, more attention should have been paid to the combining of the results and especially the fact that the results of the recommendations were far from independent. Some method for filtering the results should have been applied, perhaps by modifying the weights for the individual tags by using the information whether the target user has used a certain tag before and how popular the tag is. Simple methods should not be completely neglected, as they can provide useful results for users who do not conform to the tagging practices of the mainline users of the folksonomy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Discussion</head><p>F-measure works as a performance measure for tag recommendation to a certain extent, but the utility of tag recommendation methods for usability and search within a folksonomy should be confirmed with user tests. Combining different tag recommendation results with different weights at different times may cause the recommendation to feel inconsistent.</p><p>Searching within a folksonomy is sometimes unnecessarily difficult. A part of the problem is that users tend to use only a few tags per post. One improvement for these tagging systems would be to ask for applicability of a set of tags that are similar to the ones user has already chosen. It might make sense to distinguish between the problems of tag prediction, that is, predicting the tags user will choose, and tag recommendation, the problem of finding descriptive tags for a resource.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Tags ordered by number of uses and "popularity"</figDesc><table><row><cell cols="2">Number of uses Popularity</cell></row><row><cell>bookmarks</cell><cell>software</cell></row><row><cell>zzztosort</cell><cell>web</cell></row><row><cell>video</cell><cell>web20</cell></row><row><cell>software</cell><cell>video</cell></row><row><cell>programming</cell><cell>blog</cell></row><row><cell>web20</cell><cell>bookmarks</cell></row><row><cell>books</cell><cell>programming</cell></row><row><cell>media</cell><cell>internet</cell></row><row><cell>tools</cell><cell>tools</cell></row><row><cell>web</cell><cell>social</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Recommendation methods</figDesc><table><row><cell>Method</cell></row><row><cell>Collaborative filtering (UR neighbourhood)</cell></row><row><cell>Collaborative filtering (UT neighbourhood)</cell></row><row><cell>Most frequent tags by resource</cell></row><row><cell>Most frequent tags by resource (popularity &gt; 3)</cell></row><row><cell>Most frequent user tags</cell></row><row><cell>Most frequent user tags (popularity &gt; 3)</cell></row><row><cell>Most popular global tags</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 .</head><label>3</label><figDesc>Results on the competition set</figDesc><table><row><cell>Method</cell><cell>F-measure with 5 tags</cell></row><row><cell>CF-UR</cell><cell>0.2084</cell></row><row><cell>CF-UT</cell><cell>0.2317</cell></row><row><cell>resource tags</cell><cell>0.3067</cell></row><row><cell>resource tags (popularity &gt; 3)</cell><cell>0.2940</cell></row><row><cell>user tags</cell><cell>0.0935</cell></row><row><cell>user tags (popularity &gt; 3)</cell><cell>0.0050</cell></row><row><cell>popular tags</cell><cell>0.0354</cell></row><row><cell>combined</cell><cell>0.2952</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Acknowledgements</head><p>The author acknowledges Heikki Kallasjoki's technical assistance and Mari-Sanna Paukkeri's comments. This work was supported by the Academy of Finland through the Adaptive Informatics Research Centre that is a part of the Finnish Centre of Excellence Programme.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Tag recommendations in folksonomies</title>
		<author>
			<persName><forename type="first">R</forename><surname>Jäschke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">B</forename><surname>Marinho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hotho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Schmidt-Thieme</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Stumme</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">PKDD</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">N</forename><surname>Kok</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Koronacki</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><forename type="middle">L</forename><surname>De Mántaras</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Matwin</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Mladenic</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Skowron</surname></persName>
		</editor>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="volume">4702</biblScope>
			<biblScope unit="page" from="506" to="514" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">BibSonomy: A social bookmark and publication sharing system</title>
		<author>
			<persName><forename type="first">A</forename><surname>Hotho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Jäschke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Schmitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Stumme</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Workshop on Conceptual Structure Tool Interoperability at the Int. Conf. on Conceptual Structures</title>
				<meeting>Workshop on Conceptual Structure Tool Interoperability at the Int. Conf. on Conceptual Structures</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="87" to="102" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Generalized cores</title>
		<author>
			<persName><forename type="first">V</forename><surname>Batagelj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zaversnik</surname></persName>
		</author>
		<idno>cs.DS/0202039</idno>
		<ptr target="http://arxiv.org/abs/cs/0202039" />
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
