Fostering knowledge evolution through community-based participation Domenico Gendarmi Fabio Abbattista Filippo Lanubile University of Bari University of Bari University of Bari Dipartimento di Informatica Dipartimento di Informatica Dipartimento di Informatica Via E. Orabona, 4 - 70125 Bari Via E. Orabona, 4 - 70125 Bari Via E. Orabona, 4 - 70125 Bari +390805442286 +390805443298 +390805443261 gendarmi@di.uniba.it fabio@di.uniba.it lanubile@di.uniba.it ABSTRACT ways to find and work with information that matches their The ontology development process is typically led by single or personal needs, interests, and capabilities. Then people need to small groups of experts, with users mostly playing a passive role. bring together their individual knowledge to build a shared Such an elitist approach in building ontologies hinders the understanding and collaborative outcomes [14]. This can be primary purpose of large-scale knowledge sharing. Collaborative accomplished by the Semantic Web whose main goal is to enable tagging systems have emerged as a new web annotation method computers and people to work in cooperation [1]. proving appealing features in fostering users to collaboratively Ontologies play a relevant role within the Semantic Web vision, organize information through their own metadata. Collaborative because they allow to cope with heterogeneous representations of tagging shifts the creation of metadata for indexing web web resources, providing a common understanding of a domain to resources, from an individual professional activity to a collective be shared among human beings and software agents [6]. The endeavor, where every user is a potential contributor. domain model implicit in an ontology can be taken as a unified In this paper we introduce an approach to knowledge evolution structure for giving information a common representation and which aims to exploit the ability of collaborative tagging in semantic [2]. However the ontology development process is fostering community members participation to move forward an typically led by single or small groups of experts, with users initial knowledge structure. We present user scenarios about how mostly playing a passive role. Such an elitist approach in building subscribers of a scientific digital library might play the role of ontologies hinders the primary purpose of large-scale knowledge knowledge organizers through personal organization and sharing sharing. of citations of interest. The achievement of a widespread participation in the ontology development process is often hampered by entry barriers, like the Categories and Subject Descriptors lack of easy-to-use and intuitive tools for ontology contribution. H.3.5 Online Information Services, H.3.7 Digital Libraries, H.5.3 Barriers to active participation, combined with traditional top- Group and Organization Interfaces. down approaches in building ontologies, force users to conform to an undesirable knowledge representation. Such an imposition General Terms weakens common ground and increases the likelihood that the Design, Human Factors. ontology will not be widely used. Keywords Ontologies need to change as fast as the parts of the world they describe [7]. However, changes have to be captured and applied Community, knowledge evolution, collaborative tagging. by skilled knowledge engineers, preferably the original creators of the ontology. This is a bottleneck which causes unacceptable 1. INTRODUCTION delays in the ontology maintenance process. Knowledge is strongly tied up with cognitive and social aspects, as the management of knowledge occurs within a tangled A reasonable assumption on how to reduce maintenance costs is structured social context. Human and social factors involved in to spread the burden across users. In fact, given the Web's fractal the development and exchange of knowledge have a heavy impact nature, costs might decrease as ontology users increase in number on the design of knowledge management supporting systems [16]. [13]. Community participation to ontology development has Such a collaborative knowledge construction process takes place already been identified as a solution to a more complete and up- when multiple participants contribute to the growth of to-date structured knowledge construction [19]. Other than being interpretations on a shared information base, simultaneously group of users with common interests, communities can then be extended by information seeking and transformations [15]. considered as the top layer of the Semantic Web architecture [12]. In order to help community members constructing knowledge in This paper describes our vision for enabling a community of their own personal perspectives while also negotiating shared autonomous users to cooperate in a dynamic and open understanding, two needs have to be addressed: First, people need environment, collectively evolving an initial knowledge structure. Participants can organize some piece of knowledge according to a self-established vocabulary, building up personal taxonomies for searching and browsing through their own information spaces. By sharing portions of their knowledge, users can also create Copyright is held by the author/owner(s). WWW 2007, May 8--12, 2007, Banff, Canada. connections and negotiate meaning with people having similar can coexist with popular ones without disrupting the implicit interests. emerging consensus on the meaning of the terms. The main goals of the proposed approach are: (1) to allow users to The main drawbacks with tags concern semantic and cognitive organize personal information spaces, starting from a prearranged issues, such as polysemy, synonymy and basic level variation [5]. knowledge structure; and (2) to take advantage of users’ Polysemy occurs when the same term is used for tags employed contribution for better reflecting the community evolution of a with different meanings. The polysemy problem affects query shared knowledge structure. results by returning potentially related but often inappropriate resources. Polysemy is occasionally equalized to homonymy, The rest of the paper is organized as follows. however polysemous words have different meanings but related Section 2 provides background information about collaborative senses, while homonyms have multiple, unrelated meanings. tagging systems. In Section 3 we describe our approach to Synonymy takes place when different terms are used for tags community-based evolution through a specific context, a having the same meaning. Synonymous tags are another source of scientific digital library, and a number of user scenarios. Section 4 ambiguity, severely hindering the discovery of all the relevant summarizes related work that can be seen as complementary to resources which are available in a tagging system. Polysemy and our approach. Finally section 5 draws conclusions and points out synonymy represent two critical aspects of a search, as they some challenges we are going to address in the near future. respectively affect precision and recall, which are typically used for evaluating information retrieval systems. 2. COLLABORATIVE TAGGING A further relevant problem, concerning the cognitive aspect of SYSTEMS categorization, is the basic level variation of tags. Terms used to One of the major obstacles hindering the widespread adoption of describe a resource can vary along a continuum of specificity controlled vocabularies is the constant growth of available content ranging from very general to particularly specific. Different users which anticipates the ability of any single authority to create and can use terms at different levels of abstraction to describe the index metadata. In such contexts collaborative tagging represents same resource, leading to a low recall in retrieving resources. a potential solution to the vocabulary problem [4]. Collaborative tagging is also referred to as "folksonomy", Collaborative tagging has emerged as a new social-driven originally coined by Thomas Vander Wal who combined the annotation method, as it shifts the creation of metadata for words "folk" and "taxonomy", this term refers to a taxonomy describing web resources, from an individual professional activity created by common people [17]. However, taxonomies are to a collective endeavor, where every user is a potential hierarchical structures of classifications with parent-child contributor. relationships among concepts. Figure 1 shows a conceptual model of collaborative tagging, While it is well-known that search and retrieval are facilitated by according to UML notation [3], with tags seen as association structured subject headings, the tags which form a folksonomy are classes between users and resources. Users can label any resource just flat terms. Besides the previous drawbacks, the lack of a with whatever tag thought as appropriate and, vice versa, structure is one of the main aspects which weaken severely the resources can be annotated with any tag by any user. Users are information retrieval in a collaborative tagging system. able to share both resources and tags within a community, leading to a network of users, resources and tags with a flat structure and 3. OUR APPROACH TO COMMUNITY no limits in evolution. KNOWLEDGE EVOLUTION In this section we lay out our approach for applying collaborative tagging techniques to support the evolution of a knowledge structure adopted for the classification of a wide amount of digital resources. We first briefly introduce a scientific digital library that we have selected as an application context. Then we present the knowledge evolution process from a user perspective. Figure 1. Conceptual model of collaborative tagging 3.1 Approach Context As an illustrative context for our approach, we consider the digital Collaborative tagging systems exhibit other interesting benefits library of the Association for Computing Machinery (ACM). such as their ability in adhering to the personal way of thinking. No forced restrictions on the allowed terms, as well as the lack of The ACM Guide to Computing Literature is an index to syntax to learn can shorten significantly the learning curve. computing literature from over 3000 publishers, containing over Collaborative tagging systems also create a strong sense of 750,000 citations of books, journal articles, conference community amongst their users, allowing them to realize how proceedings, doctoral and master’s theses, and technical reports. others have categorized the same resource or how the same tag Citations can be browsed by publication type, author name, as has been used to label different resources. This immediate well as authors’ keywords and classification terms from the ACM feedback leads to an attractive form of asynchronous taxonomy, named The Computing Classification System. communication through metadata [10]. There is no need to The ACM Guide to Computing Literature is part of the services establish a common agreement on the meaning of a tag because it offered by the ACM Portal. Portal subscribers can create any gradually emerges with the use of the system. Marginal opinions number of binders, which are personal collections of citations with links to the publication source through the Digital Object Identifier (DOI) bookmark, and the full text if the citation is that article (e.g. abstract, references, index terms, collaborative published by ACM itself. When creating their binders, users colleagues). Once explored more in detail some results, John finds choose whether to keep them private or share them with other as citation of interest the article named “Usage patterns of selected users or, more generally, the public. collaborative tagging systems”. John wants to save it into his own personal information space using the “Save this Article to a 3.2 User Perspective Binder” feature (Figure 3). According to our approach, the interaction process of a user with a digital library can be characterized as a three-step iteration (Figure 2). 1. Selection. It involves discovering and choosing a specific citation in the whole repository. This step is already available in a common digital library. 2. Organization. It involves creating and structuring a personal information space according to individual interests. This step goes beyond current opportunities because it allows not only to store collections of citations of interest but also to group them using the desired metadata and structure. 3. Sharing. It involves making public some selected collections and corresponding metadata in order to support a community Figure 3. Detailed page of the selected citation knowledge evolution. To explain how our approach can affect the user experience, 3.2.2 Organization afterwards we present a scenario for each step. John now has to choose the name of the binder where saving the selected citation. This name represents the label of a specific category playing the role of a virtual folder where storing a collection of citations. In choosing the name John is supported by a suggestion feature providing a set of potential binder names. In this case some suggested binder names can be collaborative tagging systems, delicious studies and social bookmarking analyses. John chooses to store the citation in a binder named tagging patterns. Saving an article into a virtual personal space is a sign of a real interest for the citation, hence we can assume that John is wishful to provide the metadata he considers most appropriate for annotating the selected citation. However, to avoid burdening John’s experience, authoring metadata have to remain as simple Figure 2. Three-step iteration as in collaborative tagging systems. The task assigned to John is just to browse a space of suggested 3.2.1 Selection metadata, pointing out the most favorites and eventually John is an ACM member with a web account on the Portal. As an proposing new ones. Through the DOI, the system is able to assignment, he has to write a state of the art about collaborative univocally identify the selected citation, and a large set of tagging systems. He is not looking for well-known papers but, metadata related to that article can be retrieved from different rather his goal is to explore the recent bibliography on this systems freely available on the web. For example for the selected specific topic to discover new scientific articles he could find citation the system could retrieve keywords from ACM, as well as interesting to read. tags from services like CiteULike, Bibsonomy and Connotea (Figure 4). In order to find citations within the ACM Portal, John has two options: He can perform a search (basic or advanced); otherwise he can browse the repository in several different ways. For example, he can browse through the Guide using index terms of the ACM taxonomy or he can browse through the Digital Library according to the kinds of publications. However, due to the limitations of the current taxonomy in organizing citations, especially for articles about recent topics as collaborative tagging, John prefers to use the search feature. John performs a simple query, within the Guide, using as keywords the sentence collaborative tagging. A list of results showing a set of basic information (e.g. title, authors, publishers, year of publication) for each matching citation is presented to John ordered by relevance. John, then, can select a specific citation to let the system display additional information related to Figure 4. Retrieved metadata of the selected citation Using a filtering process to discard useless keywords or tags, such as those occurring isolated and group very similar ones, this space of metadata can be normalized in order to help John in the browsing task (Figure 5). Figure 5. Space of metadata Figure 7. Synonyms, hypernyms and hyponyms for the While browsing, John can select a metadata and, just picking out selected sense of the term classification it, he can state his agreement or disagreement (e.g. Y/N). In this case, browsing the space in Figure 5, John selects classification and expresses an agreement with such a term. For example, a possible suggestion can be to attach the new Using a lexical resource, such as Wordnet, a searching for concept as child of information storage (Figure 8). If John possible multiple senses associated to the selected term can be approves this suggestion a relationship between information performed. Four senses are retrieved from Wordnet for the noun storage and classification will be added and the new taxonomy classification and John disambiguates these senses selecting the will be stored in John’s personal information space. From now on, first one (Figure 6). Furthermore, Wordnet can provide synonyms, the digital library will keep track of new concepts in the John’s hypernyms and hyponyms related to the selected sense (Figure 7). personal taxonomy and additions of new concepts will be checked The system can thus map the term chosen by John to a to avoid inconsistencies. The selected citation will be corresponding concept including relationships with other related automatically classified in John’s personal space, according to the concepts. new concept just added (Figure 9). While browsing the space of metadata, John can select and agree with another term, such as collaborative tagging which could not have any associated sense in Wordnet. In this case John has not to disambiguate any sense but he has to provide a brief description of the concept. Anyway John has to find the right place in the taxonomy where to insert the concept corresponding to the selected term. John can also disagree with a term in the space of metadata, in such a situation he can optionally propose new terms. Proposing a new term renders the same scenario as if he has chosen an existing one in the space of metadata. Figure 6. Senses for the term classification John now has to decide the best position, within the ACM taxonomy, where to put the concept corresponding to the selected term classification. In such a task John can be supported by the system through some recommendations suggesting possible relevant parts of the taxonomy where the concept could already exist or where the concept could be inserted. Figure 8. Suggested taxonomy branch where to attach the concept associated with the term classification Figure 11. A portion of Michael's personal taxonomy Figure 9. Personal taxonomy Lucia has shared a binder named tagging systems analyses where 3.2.3 Sharing she stored all the citations in the Michael’s binder and the citation John’s information space will be structured in a set of binders named “What goes around comes around: an analysis of where he will store citations classified according to his favorite del.icio.us as social space”. In Figure 12 there is the portion of metadata. Moreover, storing and annotating citations will give Lucia’s personal taxonomy relative to all the citations in her rise to an evolving personal taxonomy which John can exploit to shared binder. browse through his personal space. Using the digital library, a user profile will be created in order to keep track of topics of interest. For each binder created by John, one or more corresponding topics of interest will be included in his profile (Figure 10). Figure 12. A portion of Lucia's personal taxonomy Once John has shared the binder, he gains access to a shared information space concerning a particular topic related to the binder. In this shared space, John can view all users interested in the same topic, all citations relevant to the topic stored by these users, as well as one or more shared taxonomies. Every taxonomy in this shared space has the purpose to represent a particular perspective on that topic, depicting a common way to classify related citations employed by a group of people with similar interests. One or more shared portions of these taxonomies are recommended to John. He is now allowed to rank suggestions in accordance with his own perspective. As a result, the shared information space will be displayed to John (Figure 13). Now John can perform any of the following actions: Figure 10. Creation of a user profile • browse through users’ personal information spaces, viewing user profiles, taxonomies, shared binders, unless they have been kept as private; John now chooses to share the binder just created, named tagging • discover new citations about the topic collaborative tagging patterns. Within John’s profile the systems looks for one or more and add them to either the shared binder or a new one; topics of interest associated to that binder. Having established the topic of the shared binder, the system looks for other profiles with • observe how shared taxonomies have been ranked by other the same topic, in order to find users which share similar interests users and express his own grade. with John. After John has shared his binder, users, who have previously contributed to the shared space, will be notified about changes. For example two other users, Michael and Lucia have in their Afterwards, users can check the information space in order to profiles analogous topics about collaborative tagging dynamics. discover new users with their own similar interests, new citations Michael has in his personal space a shared binder named tagging about the topic, and changes to the shared taxonomies. studies, with the same citation stored by John and other two citations, respectively named “Tagging, communities, vocabulary, John hence contributes to a community perspective for the topic evolution” and the other titled “HT06, tagging paper, taxonomy, of interest by sharing his personal metadata as well as expressing Flickr, academic article, to read”. Figure 11 shows a portion of his preference on the shared taxonomies. On the other hand, he the Michael’s personal taxonomy which describes how Michael gets feedback for his personal organization while actively taking has classified citations within his shared binder. part to the community. Figure 13. The resulting shared information space graph is created exploiting the social network notion of graph 4. RELATED WORK centrality. Starting from the similarity graph and according to While our approach aims to apply collaborative tagging concepts three fundamental hypotheses, namely hierarchy representation, to the problem of knowledge evolution, much research work noise and general-general assumptions, a latent hierarchical assumes the opposite perspective: Discovering semantic relations taxonomy is extracted. among tags to enhance how current collaborative tagging systems Wu et al. [18] exploit a probabilistic generative model to work. represent the user's annotation behavior in a social bookmarking Mika [11] extends the traditional bipartite model of ontologies system and to automatically derive the emergent semantics of the with the social dimension leading to a tripartite model of tags. Starting from the assumption that tags heavily used by users ontologies with three different classes of nodes, namely persons, with similar interests are semantically related, the authors apply concepts, and instances and hyperedges representing the statistical techniques to discover semantic relationships from the commitment of a person in terms of classifying an instance as different frequencies of co-occurrences among users, resources belonging to a certain concept. This model is exploited by and tags. The resulting emergent semantics of user interests, tags generating two kinds of association networks: the network of and web resources is then exploited to develop an intelligent concepts and instances and the network of people and concepts. semantic search system with the purpose to search and discover From the association network of concepts and instances, it is semantically-related web resources. extracted a classification hierarchy. From the network of people and concepts, the author generates a hierarchy based on sub- 5. CONCLUSION community relationships. This paper provides a community-driven approach to knowledge Hotho et al. [9] propose an adaptation of a data mining approach evolution. Although we have depicted scenarios for a research to detect emergent semantics within a collaborative tagging community, the proposal applies to other online communities. system. The adaptation lies in reducing the three-dimensional As in collaborative tagging systems, the main idea is to shift the folksonomy to a two-dimensional formal context in order to apply creation of metadata from a restricted to a collective activity, but association rule mining techniques. Discovered association rules still maintaining the expressiveness an ontology can provide for can be then exploited in a recommender system which supports classification. the user in choosing useful tags. The obtained rules can be also seen as subsumption relations, in order to learn a taxonomic Knowledge engineers struggle to capture all the variety taking structure. place within a lively community. We hypothesize that augmenting users’ participation in the process of annotating and In [8] authors present an algorithm that tries to address the basic classifying shared items reflects the community knowledge more level variation issue by converting a large corpus of tags into a effectively than relying on prescribed knowledge structures, navigable hierarchical taxonomy. Tags are grouped using vectors maintained by a central authority. A collaborative approach to according to the number of times each tag has been used for every knowledge evolution can split costs over a wide group of people, annotated resource. Then, the algorithm defines a function to who have special interests in specific knowledge domains. calculate similarity between vectors and a threshold to prune irrelevant values. Finally, for a given dataset a tag similarity The scenarios presented in this paper point out how challenging is Computer Science, Stanford University, Stanford, CA, USA to directly involve users in the knowledge evolution process. We (2006). need to provide tool support to allow community members to [9] Hotho, A., Jäschke, R., Schmitz, C., Stumme, G. Emergent easily organize their personal information spaces, and contribute Semantics in BibSonomy. Proc. Workshop on Applications with a minimal overload. We intend to develop a software agent of Semantic Technologies, Informatik 2006, Dresden, 2006. which is able to monitor users’ interactions with the system and learn about users’ interests. The agent will gain access to [10] Mathes, A. Folksonomies-Cooperative Classification and metadata in users’ personal information spaces to discover topics Communication Through Shared Metadata. Technical of interest. In order to enable software agents to better handle Report, LIS590CMC, Computer Mediated Communication, metadata, users’ tags will be rendered as RDF statements rather Graduate School of Library and Information Science, than simple keywords expressed in natural language. University of Illinois, Urbana-Champaign, 2004. The approach presented here is a first step toward a collaborative [11] Mika, P. Ontologies are us: A unified model of social knowledge evolution system with the aim to provide an enhanced networks and semantics. Proceedings of the 4th International infrastructure supporting the ever-evolving community Semantic Web Conference (ISWC 2005), LNCS 3729, knowledge through the active participation of its members. Springer-Verlag, 2005. [12] Mika, P. Social Networks and the Semantic Web: The Next 6. REFERENCES Challenge. IEEE Intelligent Systems 20 (2005). [1] Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (2001). [13] Shadbolt, N., Berners-Lee, T., and Hall, W. The Semantic Web Revisited, IEEE Intelligent Systems, 21, 3 (2006), 96- [2] Davies, J., Fensel, D., and Harmelen, F. Towards the 101. Semantic Web: Ontology-driven Knowledge Management. John Wiley & Sons, 2003. [14] Stahl, G. 2000. Collaborative information environments to support knowledge construction by communities. AI Soc. 14, [3] Fowler, M. Uml Distilled: A Brief Guide to the Standard 1 (Apr. 2000), 71-97. Object Modeling Language. Addison-Wesley Professional, 2004. [15] Suthers, D. D. 2005. Collaborative Knowledge Construction through Shared Representations. In Proceedings of the [4] Furnas, G., Landauer, T., Gomez, L., and Dumais, S. The Proceedings of the 38th Annual Hawaii international vocabulary problem in human-system communication, Conference on System Sciences (Hicss'05). Communications of the ACM, 30, 11 (1987), 964-971. [16] Thomas, J. C., Kellogg, W. A., Erickson, T. The knowledge [5] Golder, S. and Huberman, B. Usage patterns of collaborative management puzzle: Human and social factors in knowledge tagging systems, Journal of Information Science, 32, 2 management. IBM Systems Journal 40, 4 (2001), 863-884. (2006), 198-208. [17] Vander Wal, T. Folksonomy Definition and Wikipedia.2005. [6] Gruber, T.: Toward Principles for the Design of Ontologies Used for Knowledge Sharing. International Journal Human- [18] Wu, X., Zhang, L., Yu, Y.: Exploring social annotations for Computer Studies 43 (1993), 907-928. the semantic web. Proc. of the 15th international conference on World Wide Web (2006), 417-426. [7] Haase, P., Völker, J., and Sure, Y. Management of dynamic knowledge, Journal of Knowledge Management, 9, 5 (2005), [19] Zhdanova, A. V., Krummenacher, R., Henke, J., and Fensel, 97-107. D. 2005. Community-Driven Ontology Management: DERI Case Study. In Proceedings of the the 2005 IEEE/WIC/ACM [8] Heymann, P., Garcia-Molina, H.: Collaborative Creation of international Conference on Web intelligence (Wi 2005), Communal Hierarchical Taxonomies in Social Tagging IEEE Computer Society Press. Systems. Technical Report InfoLab 2006-10, Department of