Working the Crowd: Design Principles and Early Lessons
              from the Social-Semantic Web

                Mathias Niepert                     Cameron Buckner                          Colin Allen
               Indiana University                    Indiana University                 Indiana University
            Department of Computer                Department of Philosophy          Department of History and
                    Science                      cbuckner@indiana.edu                Philosophy of Science &
           mniepert@indiana.edu                                                    Program in Cognitive Science
                                                                                     colallen@indiana.edu

ABSTRACT                                                         of structural depth, precision, and reasoning capabilities.
The Indiana Philosophy Ontology (InPhO) project is pre-             While semantic web projects which impact the way the
sented as one of the first social-semantic web endeavors which   public is using the Web have largely failed to materialize,
aims to bootstrap feedback from users unskilled in ontology      ontology-based approaches to data organization and integra-
design into a precise representation of a specific domain.       tion have produced significant successes in certain domains,
Our approach combines statistical text processing methods        especially in bio- and medical informatics projects (such as
with expert feedback and logic programming approaches to         the Gene Ontology) and in business applications. A factor
create a dynamic semantic representation of the discipline       severely hindering such approaches from being successfully
of philosophy. We describe the basic principles and initial      applied to the Web at large, however, is that once elabo-
experimental results of our system.                              rate and precise ontologies have been created, expertise in
                                                                 both ontology design and the relevant domain are required to
                                                                 populate and maintain them. Thus, semantic web projects
General Terms                                                    have faced the dilemma of either hiring expensive “double
Social Semantic Web, Ontologies, Folksonomies, Provenance        experts” highly-skilled in both ontology design and the rel-
                                                                 evant domain or face inevitable data and user sparseness[3].
1.   INTRODUCTION                                                   Fortunately, researchers are beginning to realize that not
   Until recently, research on the social web (Web 2.0) and      only is there no inherent opposition between these two ap-
semantic web has been largely segregated. This may not be        proaches, but that their strengths and weaknesses are com-
surprising, as the two approaches seem to offer competing        plementary[1, 5]. Thus, some have begun to call for the
visions for the future of the Internet. Social web researchers   development of the “social-semantic” web, which would com-
devise ways to harness the “wisdom of the crowds” to struc-      bine social web’s facility for obtaining data from volunteer
ture web data around information obtained from collabora-        users with the semantic web’s elegant and precise data rep-
tive social interactions between large numbers of amateur        resentations. The combination of these two approaches faces
users. Semantic web researchers, on the other hand, empha-       its own unique set of problems, and large-scale social-semantic
size the need for a technically precise backbone of formal on-   web projects which produce precise, high-quality data rep-
tologies developed by small groups of experts highly-trained     resentation without presuming ontology design expertise of
in the best practices of ontology design. Cultural differences   their users are still gleams in their future developers’ eyes[4].
have further fueled misconceptions and misunderstandings         In this paper, however, we describe the Indiana Philosophy
between these two research communities, often leading them       Ontology (InPhO) project as one of the first social-semantic
to regard one another with mutual skepticism.                    web endeavors which aims to bootstrap feedback from users
   Both approaches have had some striking successes. Web         unskilled in ontology design into a precise representation of
2.0 applications like Wikipedia, Facebook, Del.icio.us, and      the domain. We will describe our ongoing solutions to some
Flickr have reshaped the way average users interact with the     of the challenges facing this nascent area of research. At
Web. A key strength of such approaches lies in their abil-       the InPhO project, we are developing a dynamic ontology
ity to obtain large amounts of information from unskilled        for the domain of philosophy. This knowledge base is being
volunteers and to combine information obtained from many         deployed primarily to serve the metadata needs of the Stan-
different kinds of sources creatively. Such applications, how-   ford Encyclopedia of Philosophy (SEP) (although it has a
ever, face severe problems of data organization, validation,     wide array of other uses). Our approach combines statistical
and integration, especially as they aspire to make data acces-   text processing with expert feedback to create a dynamic se-
sible and interoperable by organizing it according to seman-     mantic representation of the entities described in the SEP’s
tic taxonomies. Some have proposed learning taxonomies           articles. While tagging approaches rely on users to sponta-
from social tagging systems as a solution to this problem[2].    neously provide the needed feedback, our approach is based
However, given that social tags are simply words applied         on the principle that if automated methods are used to guide
to resources like documents and images, folksonomists have       users towards providing data which is most needed and for
found themselves facing many of the same difficult prob-         which they are most qualified, high-quality information can
lems that face researchers who try to induce taxonomies by       be obtained without placing undue demands on volunteer
processing natural language corpora. These problems in-          contributors.
clude term ambiguity and the induced representation’s lack
2.     INPHO: BASIC PRINCIPLES FOR A
       SOCIAL SEMANTIC WEB PROJECT
   We believe that heavy user participation is key for social
semantic web projects for keeping both the formal repre-
sentation and its content up-to-date and of highest qual-
ity. In most cases, users experience top-down and static
ontologies as too restrictive. Motivated by this considera-
tion, we propose some basic principles for social semantic
web projects which we strive to realize in the context of the
InPhO project.
Pragmatic Ontology Design
For many projects, especially those that rely on user par-
ticipation, it is often unfeasible to design a static top-down
ontology that models the targeted domain exhaustively. We
believe that the social semantic web is better served by var-
ious specialized and dynamic ontologies that utilize semi-
automated tools for information integration. Formal ontolo-
gies should be kept simple in the initial design phase and
they should be iteratively and dynamically extended and
populated through a combination of automated data pro-
cessing methods, user feedback, and logical reasoning[9].
Ontology Extension as Iterative Relation Addition
and Refinement
Many complex ontologies leave users bewildered by com-
plications and thereby languish with huge sections almost
entirely unpopulated. To ensure that data representations
remain both relevant and well-populated, we believe that
ontology design should be incremental and driven by user
participation. For example, InPhO’s influenced-by relation
between philosophers can easily be populated by validat-
ing and integrating semi-structured data from Wikipedia[8].
However, the relation does not carry any specific informa-
tion about what kind of influence and in which area of phi-
losophy the influence took place. Hence, at later stages,
one might decide to refine the relation by introducing a re-
lation influenced-in-area, which relates an instance of the
influenced-by relation to an instance of a philosophical area.
Note that this is a form of tagging of pairs of entities. This is   Figure 1: InPhO’s “Idea Tree” interface which lets
also supported by current W3C standards: OWL (and RDF               users quickly label relationships between pairs of
in general) natively supports binary relations only, but al-        philosophical ideas, ranked by statistical text pro-
lows several methods for modeling higher-order relations1 .         cessing algorithms.
For example, the RDF standard allows relation instances to
be treated as first-class citizens (reification). We believe that
the pieces of information users are asked to provide should         holding between them, choosing from a predefined set of la-
be kept as simple as possible and that the process should re-       bels. For example, Figure 1 depicts one of InPhO’s interfaces
semble the process of tagging. Projects that initially define       which provides users with pairs of philosophical ideas in their
intricate higher-order relations will have a hard time provid-      area of expertise for which they can evaluate the relatedness
ing sufficient incentives for participation and will ultimately     and relative generality. In addition, users should be able to
suffer from a lack of user contribution. Furthermore, we            add data in batches and have access to an API for data entry.
believe that formal ontologies (the set of relations and ax-        Stratified Participation; Provenance and Trust
ioms) should grow with the practical needs of the individual        Most Web 2.0 projects are powered by the “wisdom of the
semantic web projects and not vice versa.                           crowd,” that is, many different users participating and col-
Ontology Population as Iterative Data Addition, Val-                laborating to create large amounts of valuable (meta-)data.
idation, and Integration                                            While we believe that large-scale semantic web projects will
Statistical text processing and other automated methods             not succeed without leveraging the “wisdom of the crowd,”
should be used to provide candidates for relation instances         we are also proponents of the position that the input of some
that can be verified and integrated using human feedback.           users should be considered more trustworthy and reliable
The verification and addition of relation instances should          than others. InPhO allows users to provide areas of exper-
resemble tagging as closely as possible. However, instead           tise in their personal profile and leverages this information to
of tagging single web entities like documents, pictures, and        guide users to contribute in meaningful ways. Through In-
videos, here pairs of entities are “tagged” with relationships      PhO’s interfaces, all users are able to contribute to and pop-
                                                                    ulate the uncertain part of the ontology, and every piece of
1
    http://www.w3.org/TR/swbp-n-aryRelations/                       data is marked with detailed provenance information. When
                                                                                                                   histogram of user deviations for relatedness score
logical reasoners are deployed to infer the taxonomic rela-                          10

tionships, the provenance information is harnessed to resolve                         9
inconsistencies appropriately. For example, evaluations from                          8
users who are experts in this subfield of philosophy are val-                         7
ued higher than feedback from novice users. In addition,


                                                                   number of users
                                                                                      6
provenance information should be provided together with
                                                                                      5
the instance data at all stages. For example, while birth
and death date information is gathered by parsing external                            4

datasets and through contributions of InPhO’s users, only                             3

the data verified by experts (i.e., authors and editors of the                        2
SEP) will be used as metadata for SEP entries.                                        1
Open Data Access and Open Community                                                   0
Users should be able to download the populated ontology                                   0    0.1   0.2   0.3   0.4   0.5     0.6 0.7 0.8 0.9 1 1.1 1.2
                                                                                                                             user deviations for relatedness score
                                                                                                                                                                     1.3   1.4   1.5   1.6   1.7   1.8

together with the provenance information and use it in ex-
ternal applications. An API should give direct access to           Figure 2: Histogram of deviations of relatedness
write and read operations. The project’s online community          scores among InPhO users with overlap ≥ 10.
should be open to everyone and contributions should be vis-
ible and attributable to individual users.                                                    0 (%)              1 (%)                2 (%)               3 (%)                  4 (%)
                                                                                0             54 (3.8)           62 (4.4)              38 (2.7)            25 (1.8)                9 (0.6)
3.   INPHO: FIRST EXPERIENCES AND                                               1             62 (4.4)           33 (2.4)              73 (5.2)            61 (4.3)               35 (2.5)
                                                                                2             38 (2.7)           73 (5.2)              62 (4.4)           116(8.3)                84 (6.0)
     INITIAL RESULTS                                                            3             25 (1.8)           61 (4.3)             116(8.3)             91 (6.5)              253(18.0)
   As of now, the Indiana Philosophy Ontology[8] contains                       4              9 (0.6)           35 (2.5)              84 (6.0)           253(18.0)              409(29.1)
four main categories: person (subclass of FOAF::person2 ),
document (from AKT3 ), organization (from SUMO4 ), and             Figure 3: Table depicting user agreement and dis-
philosophical idea, as well as an initial set of non-taxonomic     agreement on relatedness scores. Scores range from
relations. The idea category contains a taxonomic decom-           0 (unrelated) to 4 (highly related). The entry in
position of the space of philosophical ideas according to the      the i-th row and j-th column is the number of idea
disciplinary relatedness of their contents rather than accord-     pairs that have been scored as i by one user and as
ing to their structural roles. For example, instead of dividing    j by a different user. The values in parentheses are
idea about philosophy into concept, distinction, argument,         the percentages with respect to all 1405 evaluations
counterexample, and so on, the InPhO decomposes it into            with overlap.
subareas of philosophy–e.g. idea about metaphysics, idea
about epistemology, idea about logic, idea about ethics, idea
about philosophy of mind. Each subarea is in turn decom-
posed into a series of issues considered fundamental to work       45 provides the information that an idea about neural net-
in that subarea; for example, idea about philosophy of mind        works is more specific than an idea about connectionism, and
is decomposed into idea about consciousness, idea about in-        that they are highly related, the facts msp(neural network,
tentionality, idea about mental content, idea about philoso-       connectionism, 45) and s4p(neural network, connectionism,
phy of artificial intelligence, idea about philosophy of psy-      45) are added to the knowledge base. For each user, auto-
chology, and idea about metaphysics of mind. InPhO com-            matically computed trust scores and levels of expertise are
bines corpus-based measures of semantic similarity between         stored to evaluate her reliability. A non-monotonic answer
words (for examples, see[7]) and a novel relative generality       set program with stable model semantics is used daily on
measure[8], to provide, for any given philosophical idea, a        the set of first-order facts to construct the global populated
ranking of possible hyponyms and hypernyms, respectively           ontology[9]. The taxonomy can be browsed online5 .
(the interface is depicted in Figure 1). Using these carefully
designed interfaces, InPhO’s users can validate or falsify the     4. A FRAMEWORK FOR DATA-DRIVEN
estimates of semantic relatedness and relative generality of
pairs of philosophical ideas, using a predefined set of possible      TRUST MEASURES
labels. The relatedness is scored on a five-point scale from          We introduce a general framework for the assignment of
highly related to unrelated, and the generality can be eval-       trust scores to individual users based on their deviation
uated using four different options: same level of generality,      from other users’ evaluations. A method to compute de-
idea1 is more general than idea2, idea1 is more specific than      grees of trustworthiness of users in a social network us-
idea2, and the two are incomparable. The generality of two         ing semantic and social web data sources was recently pro-
ideas is deemed incomparable if they are entirely unrelated        posed[6]. Here, we focus on trust scores that are computed
or if one idea can be both more and less general than the          using the users’ evaluations of pairs of entities and their
other, depending on the context. Of course, users may skip         application to resolving feedback inconsistencies. Let U
idea pairs or provide only partial information. The feedback       be the set of users, let A and B be two sets of individ-
is stored as first-order facts in our knowledge base, together     uals in the ontology, and let L be the set of possible la-
with provenance data. For example, when a user with id             bels that can be assigned to elements in A × B. Let the
2
                                                                   label distance dist : L × L → R+ be a function that as-
  http://xmlns.com/foaf/spec/                                      signs to each pair of labels a non-negative real number. Let
3
  http://www.aktors.org/publications/ontology/
4                                                                  5
  http://www.ontologyportal.org/                                             http://inpho.cogs.indiana.edu/taxonomy/
                                  histogram of user deviation for generality evaluations
           10                                                                                               back a SEP author provides the better is her entry embedded
                   9                                                                                        in browse and search applications. However, we consider the
                   8                                                                                        objective of providing sufficient incentives for user partici-
                   7
                                                                                                            pation an ongoing research and interface design challenge.
                                                                                                               We are specifically interested in the extent of user agree-
 number of users


                   6
                                                                                                            ment on evaluations of idea pairs with semantic relatedness
                   5
                                                                                                            and relative generality labels. Thus, in the remainder of the
                   4
                                                                                                            paper, A and B are the instances of the class philosophical
                   3
                                                                                                            idea in the ontology. Users can score the semantic related-
                   2
                                                                                                            ness of two philosophical ideas on a scale from 0 (unrelated)
                   1                                                                                        to 4 (highly related). Hence, for the relatedness score we
                   0
                       0   0.1   0.2      0.3      0.4        0.5       0.6      0.7       0.8   0.9   1
                                                                                                            have L = {0, 1, 2, 3, 4} and dist(ℓ, ℓ′ ) = |ℓ − ℓ′ |. Figure 2
                                          user deviation for generality evaluations                         depicts the histogram of the evaluation deviation values for
                                                                                                            the 31 users who labeled the relatedness of one or more idea
Figure 4: Histogram of users’ deviation on relative                                                         pairs that have also been evaluated by at least 10 other users
generality labels with evaluation overlap ≥ 10.                                                             (evaluation overlap ≥ 10). Except for some outliers, the ma-
                                                                                                            jority of the users has a deviation of less than 0.5 where 4.0
                           m.s. (%)             inc./e. (%)               same (%)               m.g. (%)   is the possible maximum. Figure 3 shows the overall user
    m.s.                   489 (34.8)           127 (13.8)                79 (8.6)               33 (3.6)   agreement and disagreement. For example, only 9 out of
    inc./e.                127 (13.8)            19 (2.1)                 37 (4.0)               32 (3.5)   1405 overlapping evaluations (0.6%) have a label distance of
    same                    79 (8.6)             37 (4.0)                 35 (3.8)               49 (5.3)   4, and 1153 out of 1405 overlapping evaluations (82%) have
    m.g.                    33 (3.6)             32 (3.5)                 49 (5.3)               17 (1.9)   label distance of 1 or 0.
                                                                                                               For the relative generality evaluations, L = {0, 1, 2, 3}
Figure 5: Table depicting user agreement and dis-                                                           with 0=“more specific”, 1=“more general”, 2=“same gen-
agreement on generality evaluations. m.s.=more                                                              erality,” and 3=“incomparable/either more or less general.”
specific, m.g.=more general, same=same generality,                                                          Here, we can define dist as dist(ℓ, ℓ′ ) = 1 if ℓ 6= ℓ′ and
inc./e.=incomparable/either more or less general,                                                           dist(ℓ, ℓ′ ) = 0 otherwise. Figure 4 depicts the histogram of
depending on the context. The values in paren-                                                              the evaluation deviation values for the 30 users who labeled
theses are the percentages with respect to all 917                                                          the relative generality of one or more idea pairs that have
generality evaluations with overlap.                                                                        also been evaluated by at least 10 other users. All users have
                                                                                                            a deviation of less or equal than 0.5 where 1.0 is the possible
                                                                                                            maximum. Figure 5 shows the overall user (dis-)agreement
                                                                                                            on generality labels. For example, 489 out of 917 overlap-
E = {(a, b, ℓ, u) | a ∈ A, b ∈ B, ℓ ∈ L, u ∈ U } be the set of 4-
                                                                                                            ping evaluations (52%) agree on the label “more specific”,
tuples representing the user evaluations, that is, the assign-
                                                                                                            and there are only 33 overlapping evaluations (3.6%) with
ments of labels in L to elements in A × B by the users in U .
                                                                                                            disagreeing labels “more specific” and “more general.”
We define the evaluation deviation measure D : U → R+ as
            1                                                                                               5. REFERENCES
                                                          dist(ℓ, ℓ′ ),
                       X                     X
D(u) =                                                                                                      [1] A. Ankolekar, M. Krötzsch, D. T. Tran, and D. Vrandecic.
         |N (u)|                                                                                                The two cultures: Mashing up web 2.0 and the semantic
                   (a,b,ℓ,u)∈E (a,b,ℓ′ ,u′ )∈E with u6=u′
                                                                                                                web. Journal of Web Semantics, 6(1):70–75, 2008.
with N (u) = {(a, b, ℓ′ , u′ ) ∈ E |∃(a, b, ℓ, u) ∈ E with u′ 6=                                            [2] D. Benz and A. Hotho. Position paper: Ontology learning
u}. Of course, the smaller the evaluation deviation, the                                                        from folksonomies. In LWA’07: Lernen, Wissen, Adaption,
                                                                                                                Workshop Proceedings, pages 109–112, 2007.
higher the trust one can have in a particular user. The trust
                                                                                                            [3] C. Buckner, M. Niepert, and C. Allen. From encyclopedia to
scores (some of which might be specialized to specific areas                                                    ontology: Toward dynamic representation of the discipline of
in philosophy) can then be used together with the users lev-                                                    philosophy. Synthese. forthcoming.
els of expertise to enhance provenance information and settle                                               [4] G. Correndo and H. Alani. Survey of tools for collaborative
feedback inconsistencies with increasing sophistication.                                                        knowledge construction and sharing. In Workshop on
                                                                                                                Collective Intelligence on Semantic Web, 2007.
Initial Experimental Results                                                                                [5] T. Gruber. Collective knowledge systems: Where the social
                                                                                                                web meets the semantic web. Journal of Web Semantics,
As of March 25th 2009, InPhO (currently in beta testing)                                                        6(1):4–13, 2008.
has 92 registered users, 36 of which provided one or more of                                                [6] T. Heath, E. Motta, and M. Petre. Computing
the 4,653 evaluations of 2,969 distinct pairs of ideas. The set                                                 word-of-mouth trust relationships in social networks from
of users consists of volunteers who registered after the InPhO                                                  semantic web and web2.0 data sources. In Proceedings of the
system had been announced on several e-mail newsletters                                                         Workshop on Bridging the Gap between Semantic Web and
and blogs. They will soon be joined by the authors and ed-                                                      Web 2.0, 2007.
itors of the Stanford Encyclopedia of Philosophy. 39 out of                                                 [7] C. D. Manning and H. Schuetze. Foundations of Statistical
                                                                                                                Natural Language Processing. MIT Press, 1999.
the 92 users have the highest level of expertise (published in
                                                                                                            [8] M. Niepert, C. Buckner, and C. Allen. A dynamic ontology
the area) and 37 finished a graduate class in the area. From                                                    for a dynamic reference work. In Proceedings of JCDL, pages
the 47 subareas of philosophy that are currently specified                                                      288–297. ACM Press, 2007.
in the InPhO, 31 were covered by at least one expert. The                                                   [9] M. Niepert, C. Buckner, and C. Allen. Answer set
contribution incentives are twofold: (1) users have their own                                                   programming on expert feedback to populate and extend
personal account that displays type and number of contribu-                                                     dynamic ontologies. In Proceedings of FLAIRS, pages
tions and several agreement statistics and (2) the more feed-                                                   500–505. AAAI Press, 2008.