Working the Crowd: Design Principles and Early Lessons from the Social-Semantic Web Mathias Niepert Cameron Buckner Colin Allen Indiana University Indiana University Indiana University Department of Computer Department of Philosophy Department of History and Science cbuckner@indiana.edu Philosophy of Science & mniepert@indiana.edu Program in Cognitive Science colallen@indiana.edu ABSTRACT of structural depth, precision, and reasoning capabilities. The Indiana Philosophy Ontology (InPhO) project is pre- While semantic web projects which impact the way the sented as one of the first social-semantic web endeavors which public is using the Web have largely failed to materialize, aims to bootstrap feedback from users unskilled in ontology ontology-based approaches to data organization and integra- design into a precise representation of a specific domain. tion have produced significant successes in certain domains, Our approach combines statistical text processing methods especially in bio- and medical informatics projects (such as with expert feedback and logic programming approaches to the Gene Ontology) and in business applications. A factor create a dynamic semantic representation of the discipline severely hindering such approaches from being successfully of philosophy. We describe the basic principles and initial applied to the Web at large, however, is that once elabo- experimental results of our system. rate and precise ontologies have been created, expertise in both ontology design and the relevant domain are required to populate and maintain them. Thus, semantic web projects General Terms have faced the dilemma of either hiring expensive “double Social Semantic Web, Ontologies, Folksonomies, Provenance experts” highly-skilled in both ontology design and the rel- evant domain or face inevitable data and user sparseness[3]. 1. INTRODUCTION Fortunately, researchers are beginning to realize that not Until recently, research on the social web (Web 2.0) and only is there no inherent opposition between these two ap- semantic web has been largely segregated. This may not be proaches, but that their strengths and weaknesses are com- surprising, as the two approaches seem to offer competing plementary[1, 5]. Thus, some have begun to call for the visions for the future of the Internet. Social web researchers development of the “social-semantic” web, which would com- devise ways to harness the “wisdom of the crowds” to struc- bine social web’s facility for obtaining data from volunteer ture web data around information obtained from collabora- users with the semantic web’s elegant and precise data rep- tive social interactions between large numbers of amateur resentations. The combination of these two approaches faces users. Semantic web researchers, on the other hand, empha- its own unique set of problems, and large-scale social-semantic size the need for a technically precise backbone of formal on- web projects which produce precise, high-quality data rep- tologies developed by small groups of experts highly-trained resentation without presuming ontology design expertise of in the best practices of ontology design. Cultural differences their users are still gleams in their future developers’ eyes[4]. have further fueled misconceptions and misunderstandings In this paper, however, we describe the Indiana Philosophy between these two research communities, often leading them Ontology (InPhO) project as one of the first social-semantic to regard one another with mutual skepticism. web endeavors which aims to bootstrap feedback from users Both approaches have had some striking successes. Web unskilled in ontology design into a precise representation of 2.0 applications like Wikipedia, Facebook, Del.icio.us, and the domain. We will describe our ongoing solutions to some Flickr have reshaped the way average users interact with the of the challenges facing this nascent area of research. At Web. A key strength of such approaches lies in their abil- the InPhO project, we are developing a dynamic ontology ity to obtain large amounts of information from unskilled for the domain of philosophy. This knowledge base is being volunteers and to combine information obtained from many deployed primarily to serve the metadata needs of the Stan- different kinds of sources creatively. Such applications, how- ford Encyclopedia of Philosophy (SEP) (although it has a ever, face severe problems of data organization, validation, wide array of other uses). Our approach combines statistical and integration, especially as they aspire to make data acces- text processing with expert feedback to create a dynamic se- sible and interoperable by organizing it according to seman- mantic representation of the entities described in the SEP’s tic taxonomies. Some have proposed learning taxonomies articles. While tagging approaches rely on users to sponta- from social tagging systems as a solution to this problem[2]. neously provide the needed feedback, our approach is based However, given that social tags are simply words applied on the principle that if automated methods are used to guide to resources like documents and images, folksonomists have users towards providing data which is most needed and for found themselves facing many of the same difficult prob- which they are most qualified, high-quality information can lems that face researchers who try to induce taxonomies by be obtained without placing undue demands on volunteer processing natural language corpora. These problems in- contributors. clude term ambiguity and the induced representation’s lack 2. INPHO: BASIC PRINCIPLES FOR A SOCIAL SEMANTIC WEB PROJECT We believe that heavy user participation is key for social semantic web projects for keeping both the formal repre- sentation and its content up-to-date and of highest qual- ity. In most cases, users experience top-down and static ontologies as too restrictive. Motivated by this considera- tion, we propose some basic principles for social semantic web projects which we strive to realize in the context of the InPhO project. Pragmatic Ontology Design For many projects, especially those that rely on user par- ticipation, it is often unfeasible to design a static top-down ontology that models the targeted domain exhaustively. We believe that the social semantic web is better served by var- ious specialized and dynamic ontologies that utilize semi- automated tools for information integration. Formal ontolo- gies should be kept simple in the initial design phase and they should be iteratively and dynamically extended and populated through a combination of automated data pro- cessing methods, user feedback, and logical reasoning[9]. Ontology Extension as Iterative Relation Addition and Refinement Many complex ontologies leave users bewildered by com- plications and thereby languish with huge sections almost entirely unpopulated. To ensure that data representations remain both relevant and well-populated, we believe that ontology design should be incremental and driven by user participation. For example, InPhO’s influenced-by relation between philosophers can easily be populated by validat- ing and integrating semi-structured data from Wikipedia[8]. However, the relation does not carry any specific informa- tion about what kind of influence and in which area of phi- losophy the influence took place. Hence, at later stages, one might decide to refine the relation by introducing a re- lation influenced-in-area, which relates an instance of the influenced-by relation to an instance of a philosophical area. Note that this is a form of tagging of pairs of entities. This is Figure 1: InPhO’s “Idea Tree” interface which lets also supported by current W3C standards: OWL (and RDF users quickly label relationships between pairs of in general) natively supports binary relations only, but al- philosophical ideas, ranked by statistical text pro- lows several methods for modeling higher-order relations1 . cessing algorithms. For example, the RDF standard allows relation instances to be treated as first-class citizens (reification). We believe that the pieces of information users are asked to provide should holding between them, choosing from a predefined set of la- be kept as simple as possible and that the process should re- bels. For example, Figure 1 depicts one of InPhO’s interfaces semble the process of tagging. Projects that initially define which provides users with pairs of philosophical ideas in their intricate higher-order relations will have a hard time provid- area of expertise for which they can evaluate the relatedness ing sufficient incentives for participation and will ultimately and relative generality. In addition, users should be able to suffer from a lack of user contribution. Furthermore, we add data in batches and have access to an API for data entry. believe that formal ontologies (the set of relations and ax- Stratified Participation; Provenance and Trust ioms) should grow with the practical needs of the individual Most Web 2.0 projects are powered by the “wisdom of the semantic web projects and not vice versa. crowd,” that is, many different users participating and col- Ontology Population as Iterative Data Addition, Val- laborating to create large amounts of valuable (meta-)data. idation, and Integration While we believe that large-scale semantic web projects will Statistical text processing and other automated methods not succeed without leveraging the “wisdom of the crowd,” should be used to provide candidates for relation instances we are also proponents of the position that the input of some that can be verified and integrated using human feedback. users should be considered more trustworthy and reliable The verification and addition of relation instances should than others. InPhO allows users to provide areas of exper- resemble tagging as closely as possible. However, instead tise in their personal profile and leverages this information to of tagging single web entities like documents, pictures, and guide users to contribute in meaningful ways. Through In- videos, here pairs of entities are “tagged” with relationships PhO’s interfaces, all users are able to contribute to and pop- ulate the uncertain part of the ontology, and every piece of 1 http://www.w3.org/TR/swbp-n-aryRelations/ data is marked with detailed provenance information. When histogram of user deviations for relatedness score logical reasoners are deployed to infer the taxonomic rela- 10 tionships, the provenance information is harnessed to resolve 9 inconsistencies appropriately. For example, evaluations from 8 users who are experts in this subfield of philosophy are val- 7 ued higher than feedback from novice users. In addition, number of users 6 provenance information should be provided together with 5 the instance data at all stages. For example, while birth and death date information is gathered by parsing external 4 datasets and through contributions of InPhO’s users, only 3 the data verified by experts (i.e., authors and editors of the 2 SEP) will be used as metadata for SEP entries. 1 Open Data Access and Open Community 0 Users should be able to download the populated ontology 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 user deviations for relatedness score 1.3 1.4 1.5 1.6 1.7 1.8 together with the provenance information and use it in ex- ternal applications. An API should give direct access to Figure 2: Histogram of deviations of relatedness write and read operations. The project’s online community scores among InPhO users with overlap ≥ 10. should be open to everyone and contributions should be vis- ible and attributable to individual users. 0 (%) 1 (%) 2 (%) 3 (%) 4 (%) 0 54 (3.8) 62 (4.4) 38 (2.7) 25 (1.8) 9 (0.6) 3. INPHO: FIRST EXPERIENCES AND 1 62 (4.4) 33 (2.4) 73 (5.2) 61 (4.3) 35 (2.5) 2 38 (2.7) 73 (5.2) 62 (4.4) 116(8.3) 84 (6.0) INITIAL RESULTS 3 25 (1.8) 61 (4.3) 116(8.3) 91 (6.5) 253(18.0) As of now, the Indiana Philosophy Ontology[8] contains 4 9 (0.6) 35 (2.5) 84 (6.0) 253(18.0) 409(29.1) four main categories: person (subclass of FOAF::person2 ), document (from AKT3 ), organization (from SUMO4 ), and Figure 3: Table depicting user agreement and dis- philosophical idea, as well as an initial set of non-taxonomic agreement on relatedness scores. Scores range from relations. The idea category contains a taxonomic decom- 0 (unrelated) to 4 (highly related). The entry in position of the space of philosophical ideas according to the the i-th row and j-th column is the number of idea disciplinary relatedness of their contents rather than accord- pairs that have been scored as i by one user and as ing to their structural roles. For example, instead of dividing j by a different user. The values in parentheses are idea about philosophy into concept, distinction, argument, the percentages with respect to all 1405 evaluations counterexample, and so on, the InPhO decomposes it into with overlap. subareas of philosophy–e.g. idea about metaphysics, idea about epistemology, idea about logic, idea about ethics, idea about philosophy of mind. Each subarea is in turn decom- posed into a series of issues considered fundamental to work 45 provides the information that an idea about neural net- in that subarea; for example, idea about philosophy of mind works is more specific than an idea about connectionism, and is decomposed into idea about consciousness, idea about in- that they are highly related, the facts msp(neural network, tentionality, idea about mental content, idea about philoso- connectionism, 45) and s4p(neural network, connectionism, phy of artificial intelligence, idea about philosophy of psy- 45) are added to the knowledge base. For each user, auto- chology, and idea about metaphysics of mind. InPhO com- matically computed trust scores and levels of expertise are bines corpus-based measures of semantic similarity between stored to evaluate her reliability. A non-monotonic answer words (for examples, see[7]) and a novel relative generality set program with stable model semantics is used daily on measure[8], to provide, for any given philosophical idea, a the set of first-order facts to construct the global populated ranking of possible hyponyms and hypernyms, respectively ontology[9]. The taxonomy can be browsed online5 . (the interface is depicted in Figure 1). Using these carefully designed interfaces, InPhO’s users can validate or falsify the 4. A FRAMEWORK FOR DATA-DRIVEN estimates of semantic relatedness and relative generality of pairs of philosophical ideas, using a predefined set of possible TRUST MEASURES labels. The relatedness is scored on a five-point scale from We introduce a general framework for the assignment of highly related to unrelated, and the generality can be eval- trust scores to individual users based on their deviation uated using four different options: same level of generality, from other users’ evaluations. A method to compute de- idea1 is more general than idea2, idea1 is more specific than grees of trustworthiness of users in a social network us- idea2, and the two are incomparable. The generality of two ing semantic and social web data sources was recently pro- ideas is deemed incomparable if they are entirely unrelated posed[6]. Here, we focus on trust scores that are computed or if one idea can be both more and less general than the using the users’ evaluations of pairs of entities and their other, depending on the context. Of course, users may skip application to resolving feedback inconsistencies. Let U idea pairs or provide only partial information. The feedback be the set of users, let A and B be two sets of individ- is stored as first-order facts in our knowledge base, together uals in the ontology, and let L be the set of possible la- with provenance data. For example, when a user with id bels that can be assigned to elements in A × B. Let the 2 label distance dist : L × L → R+ be a function that as- http://xmlns.com/foaf/spec/ signs to each pair of labels a non-negative real number. Let 3 http://www.aktors.org/publications/ontology/ 4 5 http://www.ontologyportal.org/ http://inpho.cogs.indiana.edu/taxonomy/ histogram of user deviation for generality evaluations 10 back a SEP author provides the better is her entry embedded 9 in browse and search applications. However, we consider the 8 objective of providing sufficient incentives for user partici- 7 pation an ongoing research and interface design challenge. We are specifically interested in the extent of user agree- number of users 6 ment on evaluations of idea pairs with semantic relatedness 5 and relative generality labels. Thus, in the remainder of the 4 paper, A and B are the instances of the class philosophical 3 idea in the ontology. Users can score the semantic related- 2 ness of two philosophical ideas on a scale from 0 (unrelated) 1 to 4 (highly related). Hence, for the relatedness score we 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 have L = {0, 1, 2, 3, 4} and dist(ℓ, ℓ′ ) = |ℓ − ℓ′ |. Figure 2 user deviation for generality evaluations depicts the histogram of the evaluation deviation values for the 31 users who labeled the relatedness of one or more idea Figure 4: Histogram of users’ deviation on relative pairs that have also been evaluated by at least 10 other users generality labels with evaluation overlap ≥ 10. (evaluation overlap ≥ 10). Except for some outliers, the ma- jority of the users has a deviation of less than 0.5 where 4.0 m.s. (%) inc./e. (%) same (%) m.g. (%) is the possible maximum. Figure 3 shows the overall user m.s. 489 (34.8) 127 (13.8) 79 (8.6) 33 (3.6) agreement and disagreement. For example, only 9 out of inc./e. 127 (13.8) 19 (2.1) 37 (4.0) 32 (3.5) 1405 overlapping evaluations (0.6%) have a label distance of same 79 (8.6) 37 (4.0) 35 (3.8) 49 (5.3) 4, and 1153 out of 1405 overlapping evaluations (82%) have m.g. 33 (3.6) 32 (3.5) 49 (5.3) 17 (1.9) label distance of 1 or 0. For the relative generality evaluations, L = {0, 1, 2, 3} Figure 5: Table depicting user agreement and dis- with 0=“more specific”, 1=“more general”, 2=“same gen- agreement on generality evaluations. m.s.=more erality,” and 3=“incomparable/either more or less general.” specific, m.g.=more general, same=same generality, Here, we can define dist as dist(ℓ, ℓ′ ) = 1 if ℓ 6= ℓ′ and inc./e.=incomparable/either more or less general, dist(ℓ, ℓ′ ) = 0 otherwise. Figure 4 depicts the histogram of depending on the context. The values in paren- the evaluation deviation values for the 30 users who labeled theses are the percentages with respect to all 917 the relative generality of one or more idea pairs that have generality evaluations with overlap. also been evaluated by at least 10 other users. All users have a deviation of less or equal than 0.5 where 1.0 is the possible maximum. Figure 5 shows the overall user (dis-)agreement on generality labels. For example, 489 out of 917 overlap- E = {(a, b, ℓ, u) | a ∈ A, b ∈ B, ℓ ∈ L, u ∈ U } be the set of 4- ping evaluations (52%) agree on the label “more specific”, tuples representing the user evaluations, that is, the assign- and there are only 33 overlapping evaluations (3.6%) with ments of labels in L to elements in A × B by the users in U . disagreeing labels “more specific” and “more general.” We define the evaluation deviation measure D : U → R+ as 1 5. REFERENCES dist(ℓ, ℓ′ ), X X D(u) = [1] A. Ankolekar, M. Krötzsch, D. T. Tran, and D. Vrandecic. |N (u)| The two cultures: Mashing up web 2.0 and the semantic (a,b,ℓ,u)∈E (a,b,ℓ′ ,u′ )∈E with u6=u′ web. Journal of Web Semantics, 6(1):70–75, 2008. with N (u) = {(a, b, ℓ′ , u′ ) ∈ E |∃(a, b, ℓ, u) ∈ E with u′ 6= [2] D. Benz and A. Hotho. Position paper: Ontology learning u}. Of course, the smaller the evaluation deviation, the from folksonomies. In LWA’07: Lernen, Wissen, Adaption, Workshop Proceedings, pages 109–112, 2007. higher the trust one can have in a particular user. The trust [3] C. Buckner, M. Niepert, and C. Allen. From encyclopedia to scores (some of which might be specialized to specific areas ontology: Toward dynamic representation of the discipline of in philosophy) can then be used together with the users lev- philosophy. Synthese. forthcoming. els of expertise to enhance provenance information and settle [4] G. Correndo and H. Alani. Survey of tools for collaborative feedback inconsistencies with increasing sophistication. knowledge construction and sharing. In Workshop on Collective Intelligence on Semantic Web, 2007. Initial Experimental Results [5] T. Gruber. Collective knowledge systems: Where the social web meets the semantic web. Journal of Web Semantics, As of March 25th 2009, InPhO (currently in beta testing) 6(1):4–13, 2008. has 92 registered users, 36 of which provided one or more of [6] T. Heath, E. Motta, and M. Petre. Computing the 4,653 evaluations of 2,969 distinct pairs of ideas. The set word-of-mouth trust relationships in social networks from of users consists of volunteers who registered after the InPhO semantic web and web2.0 data sources. In Proceedings of the system had been announced on several e-mail newsletters Workshop on Bridging the Gap between Semantic Web and and blogs. They will soon be joined by the authors and ed- Web 2.0, 2007. itors of the Stanford Encyclopedia of Philosophy. 39 out of [7] C. D. Manning and H. Schuetze. Foundations of Statistical Natural Language Processing. MIT Press, 1999. the 92 users have the highest level of expertise (published in [8] M. Niepert, C. Buckner, and C. Allen. A dynamic ontology the area) and 37 finished a graduate class in the area. From for a dynamic reference work. In Proceedings of JCDL, pages the 47 subareas of philosophy that are currently specified 288–297. ACM Press, 2007. in the InPhO, 31 were covered by at least one expert. The [9] M. Niepert, C. Buckner, and C. Allen. Answer set contribution incentives are twofold: (1) users have their own programming on expert feedback to populate and extend personal account that displays type and number of contribu- dynamic ontologies. In Proceedings of FLAIRS, pages tions and several agreement statistics and (2) the more feed- 500–505. AAAI Press, 2008.