User Interface Design Considerations
                      for Linked Data Authoring Environments
                          Stephen Davies, Jesse Hatfield, Chris Donaher, Jessica Zeitz
                                                    University of Mary Washington
                                                          1301 College Ave
                                                     Fredericksburg, VA 22401
                                                           1­540­654­1317
                                      {sdavies, jhatfiel, cdonaher, jzeitz}@umw.edu

ABSTRACT                                                              about novices' ability to generate Linked Data in the format
If non-technical end users are to contribute to the Web of Data       required by the Semantic Web. Formal knowledge
as they have to the Web of Documents, they must employ tools          representation is difficult and error-prone for most non-technical
that enable them to do so. This challenge is not easy to meet, as     people. It is a very different activity from writing in natural
formal knowledge representation is a daunting task for the            language, which is the way that most laypeople have contributed
uninitiated. Indeed, we have empirically observed that                to the Web to date. Authoring Linked Data demands an
expressing anything but the most straightforward of facts in          unswervingly consistent naming scheme, an unprecedented level
RDF-compatible format is extremely difficult for newcomers to         of exactitude, fluency with a new suite of concepts, and an
do reliably.                                                          adherence to a set of rigid and (to the layperson) seemingly
This paper reports on a controlled experiment in which novices        arbitrary rules that run counter to the way most people think, let
attempted to use a prototype Linked Data interface to both find       alone converse. Though some psychologists (e.g., [1,10,19])
and encode bits of everyday knowledge. The application                have thought semantic networks to be reflective of the way
presents a user-friendly veneer to the Semantic Web,                  human memories are encoded, one only has to watch a novice
manifesting the essential graph-based nature of the data model        struggle with expressing even basic concepts in a graph-based
while shielding the user from the complexity of syntax. This          knowledge structure to know that this activity is extremely
allows us to study user behavior in attacking the deep, cognitive
                                                                      challenging.
problem: breaking down knowledge into the triple-based
structure required by RDF Linked Data. Our study sheds light on       We believe that for non-specialists to be successful in
some of the key aspects of knowledge formulation that novices         contributing to the Web of Data, they must use tools designed to
struggle with, and suggests several specific design approaches        compensate for their weaknesses. The design of such tools
for Linked Data authoring environments that our experiment            should be informed by empirical studies that illuminate how
makes clear beneficially address crucial issues.                      target users actually go about generating Linked Data, so that
                                                                      strengths can be maximized, weaknesses complemented, and
                                                                      unfruitful trends redirected.
Categories and Subject Descriptors
H.5.2 [User Interfaces]: Interaction styles; H.5.4 [Hypertext/        The immediate goal of the work presented in this paper is not so
Hypermedia]: User issues.                                             much to design the ultimate Linked Data authoring environment
                                                                      as to empirically verify which aspects of such environments
                                                                      might be beneficial or harmful. By studying user behavior under
General Terms                                                         simulated conditions, and observing which specific aspects of
Semantic Web, Linked Data,               User   Interface   Design,   the Linked Data authoring process prove to be obstacles, we
Experimentation, Human Factors.                                       illuminate the nature of the problem and offer experimentally
                                                                      driven guidance on how to make end users successful.
1.        INTRODUCTION                                                The remainder of this paper is organized as follows. First, we
A successful, global-scale Semantic Web presupposes large
                                                                      describe related work in user studies of knowledge formulation
amounts of instance data available for machines to process. As        processes and tools. Then, we introduce OKM1, the prototype
Tom Mitchell summarized during his ISWC 2009 keynote                  Linked Data authoring tool used in our experiments,
address[14] there are essentially three ways to produce this: (1)     highlighting key features whose viability we focused on in our
humans entering structured information, (2) database owners           study. We then describe the nature of our usability experiment,
publishing their data in RDF format, and (3) employing                and present and interpret a quantitative analysis of the results.
automated natural langauge processing techniques to “read”            Finally, we summarize our findings and make generalizations
unstructured Web data.                                                and recommendations for future interfaces to Linked Data
                                                                      applications.
One might suppose that the only major impediment to (1) is
convincing the masses that they have an incentive to do this. But
in addition to the issue of motivation, serious questions arise
                                                                      1
                                                                          OKM is a recursive acryonym which stands for “OKM
 Copyright is held by the author/owner(s).                                Knowledge Management,” and is pronounced as “Occam.”
 LDOW2010, April 27, 2010, Raleigh, USA.                                  The prototype application is open-source and publicly
                                                                          accessible at http://sourceforge.net/projects/okm.
2.       RELATED WORK                                                alternative ways to express n-ary relations. None of these UI
A wide array of tools have appeared in the last several years to     aspects has, to our knowledge, been empirically studied in a
help users in the RDF generation process. These include              focused, experimental setting.
everything from semantic wikis (e.g., Platypus[7], Semantic
Mediawiki[11], IkeWiki[17]) to semantic annotation tools (e.g.,      3.         OKM FEATURES
Loomp[12], OntoAnnotate[20]) to RDF editors (e.g.,                   3.1        Basic Design
OntoWiki[2], Tabulator[4], IsaViz[16]) to full-blown ontology        OKM’s primary purpose is to serve as a testing bed for
management environments (e.g., Protege[15], Swoop[9]). With          analyzing how laypeople interact with Linked Data tools, and its
few exceptions, however, published reports on these tools have       basic design is common to many state-of-the-art RDF and
not included usability studies to evaluate their effectiveness, or   ontology editors. This commonality is key in relating OKM to
to identify the cognitive barriers users may face when using         tools currently in use by the Semantic Web community; with it,
them. The result is a body of literature that contains many          we hope to generalize the results we obtain from empirical
innovative and potentially useful user interface ideas, but with     testing to Linked Data authoring as a whole.
no core set of principles whose effectiveness has been proven
                                                                     For instance, like OntoWiki[2], Tabulator[4], Kiwi[17],
and which can guide further work.
                                                                     Semantic Wikipedia[22], and many other tools, OKM's pages
We mention here two notable efforts which did include                are “resource-centric” in that each page represents a single
illuminating usability studies. One was conducted by Staab et        resource, displaying all the properties relating to that resource.
al.[20], who performed an in-depth analysis of the behavior of       Hyperlinks to related resources can be used to traverse the site.
nine experimental subjects who used the OntoAnnotate semantic        As with Freebase[5], users primarily interact with the system in
annotation tool. Their primary measure was inter-annotator           terms of human-readable names (HRNs) rather than full URIs.
agreement; that is, the degree to which different participants       At resource creation time, OKM auto-generates a globally-
independently annotated a page in the same way. Their                unique URI for that resource (scoped to the domain name of the
conclusion, roughly speaking, was that novices to the Semantic       OKM server), but users continue to work with HRNs in order to
Web, operating in a domain where they are not experts, will not      diminish screen clutter and enable more focus on semantics than
in general produce high-quality structured knowledge, or at least    syntax.
not knowledge that agrees with one another. If nothing else, this    Users can add datatype or object properties to a resource directly
confirms the difficulty of the problem laypeople face.               from its page. In the interface, OKM refers to datatype
Noy, et al.[15], on the other hand, performed an experiment in       properties (whose values are literals) as “attributes” and object
which military domain experts used a version of Protege-2000         properties (whose values are resources) as “statements.” 2 (We
with domain-specific extensions in order to perform specific         will use this terminology throughout the remainder of this
knowledge acquisition tasks. The structure of the knowledge          paper.) The use of two terms (instead of calling everything a
base given to participants was very detailed, and comprised a        “triple”) is intended to help the user better appreciate the
precisely specified class hierarchy containing concepts (e.g.,       distinction between them, since they are created, presented, and
types of combat units) that participants used on a daily basis.      navigated differently. If the user chooses to add an “attribute,”
Unlike Staab et al.'s, Noy et al.'s conclusion was optimistic:       the property value will be interpreted as a primitive data type. If
these domain experts, with 1-2 hours of training but no              the user chooses to add a “statement,” the property value will be
computer science background, were in fact able to effectively        interpreted as the HRN of another resource. For statements, the
use a large knowledge base that concerned a domain with which        user can specify an existing resource in the system as the object
they were intimately familiar. The contrast between these two        – at which point the new resource is effectively “stitched in” to
studies' outcomes testifies to the impact that domain expertise      the rest of the graph – or else refer to a resource which does not
and domain-specific tools can have. The subjects in Staab et         yet exist, which will implicitly create that resource.
al.'s study, who used a general tool on general subject matter,      Users can also search the system for resources by typing in a
had much greater difficulty. Clearly the more challenging user       search box that autocompletes based on HRNs, or any portion
interface problem is to equip novices with a tool that is not        thereof (e.g., typing “lin” will match a resource whose HRN is
custom-tailored to any particular subject matter, but which          Abraham Lincoln.) This functionality is of course common to
facilitates the proper construction of valid Linked Data on any      innumerable tools today, from Freebase[5] to IsaViz[16] to non-
topic, even one in which users do not begin with expert-level        semantic-web tools like Wikipedia and the Google search
conceptions.                                                         interface. Also, an explicit “create” box allows resources to be
                                                                     created from scratch, and not (initially) connected to anything.
The setting we explore is more reminiscent of Staab et al.'s
study, since we are focusing on laypeople (not domain experts)       Again, since this design is similar in spirit to that of many tools
who are tasked with formulating generalized, open-ended              in existence today, we believe that empirical findings based on
knowledge. Our work differs from each of these efforts in that       OKM's interface will be of broad interest to the community of
we are examining the effect and usage of specific user interface     Linked Data researchers studying user interfaces.
features, with the goal of discovering how a general Linked Data
editor would best be designed. In particular, we analyze user
behavior in choosing resources versus literals to represent
                                                                     2
information, the efficacy of employing types and templates in            We chose these words based on survey feedback from a
the interface to steer users towards data consistency, and               previous experiment[8] in which users were asked for the most
                                                                         intuitive terms for the two concepts.
                                Figure 1. The basic OKM interface, in "view" mode. (SAT version.)

3.2      Linked Data Publishing
                                                                     In this way, Semantic Web amateurs can be empowered to
OKM stores all information that the user creates in a local Jena     contribute to the Linked Data movement by utilizing a tool with
RDF triple store[13]. Appearing in the upper corner of every         a low barrier to entry and which shields them from the syntactic
OKM page is a “Publish” link which, if pressed, will generate        complexities of RDF. Note that the current version of OKM
Linked Data for the currently-displayed resource in RDF/XML          does not support “round-trip” knowledge creation whereby
format. This Linked Data is stored in a file in a configurable       existing Linked Data (and ontologies) can be imported into the
location on the web server that is hosting the OKM installation.     tool. This feature was postponed since it did not bear upon our
It can then be accessed over the Web by dereferencing the URI        immediate experimental concern; in future studies, however, we
that OKM auto-generated for the resource, according to Linked        plan to implement this and study user behavior in interacting
Data principles.[3] Note that the RDF/XML file will contain a        with a larger, pre-existing knowledge space (in which there is
serialization of (1) all triples for which the currently-displayed   greater urgency to find and re-us existing resources.)
resource is a subject, and (2) rdfs:seeAlso links for the URIs of
                                                                     3.3      Experimental Features
resources that appear as the subject of a triple for which the
                                                                     Supplementing this normative user interface are three atypical
currently-displayed resource is an object.
                                                                     features, which formed the focus for most of the investigative
If the user presses the “Publish” link while viewing the OKM         effort described in this paper. We hypothesized that each of
home page, Linked Data for all resources in the local system         these changes to the pseudo-standard user interaction paradigm
will be generated. The entire knowledge base will thus be            would prove beneficial to novices attempting to interact with
globally exposed to Linked Data consumers.                           Semantic Web data, and for different reasons.
3.3.1    Roles and Templates                                         and a value. An autocomplete function assists the user with both
Rather than presenting all properties of a given resource in one     inputs, offering to match predicates already in the system, and
long display, OKM encourages – and in fact, mandates –               (in the case of statements) HRNs of resources already in the
organization of these properties according to the resource's         system. It is perfectly permissible, however, for the user to type
“roles.” A role is essentially an rdfs:Class to which the resource   the name of a new predicate and/or the name of a new resource,
belongs, and which acts as the rdfs:domain (or rdfs:range) of the    in which case the new item is implicitly created. The new
properties relevant to that class. Consider the screenshot in        predicate is automatically given a domain based on the role box
Figure 1. Here, the “Leonardo da Vinci” resource (which of           it was added to, and a range based on the role of the object value
course has a unique URI but which is presented to the user in        it was given. (For object resources with multiple roles, the “Set
terms of its HRN, as described above) has three roles: Artist,       Role” button can be used to select which of the resource's roles
Person, and Scientist. Each role is manifested as its own box,       should be the range of the predicate.) From that point forward,
with the relevant statements as contents. A given triple about       the system incorporates the new predicate into its ever-evolving
Leonardo Da Vinci will appear in the role box which represents       schema.
the domain for that triple (or, if Leonardo Da Vinci is the object   One important aspect of roles is that when in edit mode, a
rather than the subject of the triple, in the role box which         template appears within each role box that displays the
represents the range. The “Giorgio Vasari” triple is an example      predicates already known to have that role as a domain. Using
of this latter case.)                                                these templates is similar to inserting data in Freebase's type-
In order to add an RDF triple to the system, the user must           based editing model [6]. In Figure 2, note the predicates
choose one of the resource's roles (or add a new role) which will    “dimensions,” “period,” and “influenced” which appear in grey.
serve as the domain of the triple, and then add the triple in the    These predicates – which are absent when the resource is being
corresponding role box. The user begins this process by clicking     seen in “view mode” – appear in the box because at least one
on the “Edit” link at the top of the page, thereby putting the       other resource with the “Painting” role has a triple involving
page in “edit mode.” (See Figure 2 for an example.) The role         each of these predicates. Pressing the “Add Value” button next
boxes then acquire buttons labeled “+Attribute” and                  to a grey item will prompt the user for a value for that item. In
“+Statement,” which can be used to add attributes or statements      this way, the template suggests to the user possible predicates
to that role box. The user can then type the name of a predicate     that are consistent with the schema that exists thus far.


                                 Figure 2. The basic OKM interface, in “edit” mode. (SAT version.)
In any fairly complex knowledge base, we can predict that most          information, and an analysis of the degree of consistency
users will be unable to keep track of all the predicates in use and     laypeople exhibit in this choice.
will inevitably use different predicates to represent the same          Second, we explore the effects of an RDF editor in which
semantic concept. OKM's templates are designed to help guide            attributes are simply eliminated. It is possible, of course, to
users into editing resources in such a way that they stay within        completely do away with the concept of literals if one is
the current schema, while not constraining users from adding to         prepared to accept elements like “175.4” as resources. This is
that schema.                                                            one way of dispensing with both the angst users face in making
In summary, then, roles are intended to provide three benefits:         the decision, and also the inconsistency that can result when
1.        They lend organization to the display when a resource         users make different choices: simply take away the choice
     has many triples, in order to make information easier to           altogether. This may seem like a heavy-handed solution, but it is
     find and enter.                                                    not without theoretical merit. Consider that more than one
                                                                        prominent cognitive psychologist (e.g., [1, pp.125-7; 10, pp.34-
2.        They ensure richer data (with domain, range, and type         92; 23]) has formulated a knowledge representation theory based
     information) than novice users would ordinarily produce. It        on something akin to semantic networks, yet found no need to
     seems likely that when authoring only simple triples, most         differentiate between resouces and literals. One kind of node is
     novice users would not bother to assign types to their             all that comprises these knowledge structures, which suggests
     resources, nor domains and ranges to their predicates. (At         that a “resources only” network is in fact sufficient to encode
     the least, it is unlikely that they would consistently do this.)   human knowledge. And it places the burden of proof rather on
     With OKM, however, the act of assigning types, domains,            those who argue for the existence of two distinct kinds of
     and ranges is built in to the very process of creating triples,    entities.
     making it convenient to do and impossible to avoid.
                                                                        As described below, we deployed to experimental subjects not
3.        They provide a template of relevant predicates that is        only the version of OKM depicted in Figures 1 and 2, but also
     easy for users to fill out. This provides not only                 versions in which attributes were completely eliminated. The
     instantaneous ease of use, but promotes long-term data             “+Attribute” button was removed from all displays, which
     consistency.                                                       effectively forced users to model everything as statements. We
We present benefits 1 and 2 without proof. Later in this paper,         then compared accuracy, consistency, and user satisfaction
we provide an in-depth empirical analysis to judge the efficacy         between the different versions.
of benefit 3.
                                                                        3.3.3     Predicate Modifiers
3.3.2     The Elimination of Attributes (Literals)                      Lastly, OKM allows users to construct n-ary relations without
The flexibility that RDF offers in supporting both resources and        explicitly using reification. This feature was inspired by a recent
literals as object values is a mixed blessing. On the one hand, it      project in which we conducted a pencil-and-paper based
presents an expressive modeling device. The objects of the              experiment[8]. In this study, young adults with no previous
triples “John marriedTo Sally” and “John weightInPounds                 Semantic Web experience were asked to construct knowledge
175.4” seem inherently different: “Sally” is presumably a bona          representations (both visually and textually) corresponding to
fide resource in her own right, with other triples expressing           English sentences. Some of these sentences contained facts
information about her, whereas “175.4” intuitively seems like a         which were inherently n-ary: “Muhammad Ali fought Joe
primitive piece of raw data, undeserving of resource status.            Frazier in Detroit,” for example. (This statement relates three
Allowing authors to designate an object as one or the other             entities and hence cannot be expressed as simple subject-
affords the opportunity to express this subtle aspect of the            predicate-object triples without reifying the verb.) Our
object.                                                                 participant pool was divided so that half of them were shown
                                                                        solutions to such sentences using traditional reification
On the other hand, the existence of the distinction means that
                                                                        techniques: first, create a resource representing the verb
authors are forced to choose between the two alternatives, and
                                                                        (“AliFrazierDetroitFight,” perhaps) and then attach the other
the choice is not always easy to make. Consider triples like
                                                                        resources to it with predicates like “participant” and “location.”
“BeverlysToyota color red,” “Charlie bornIn 1982,” and
                                                                        The other half of the participant pool was instead shown
“Candice schoolYear sophomore.” The object values “red,”
                                                                        solutions involving predicate modifiers: they were permitted to
“1982,” and “sophomore” might be considered literal pieces of
                                                                        break outside the strict triple scheme and augment a triple with
data, as with the above weight example, or as first-class
                                                                        further information indented beneath it. (This is the scheme
resources. Anyone who has composed RDF for any length of
                                                                        supported by the Yago knowledge model[21].) To illustrate,
time knows that this choice presents itself at every turn, and that
                                                                        users could express the above sentence textually as:
in some cases it feels almost arbitrary.
                                                                          MuhummadAli fought          JoeFrazier
Our work presents two contributions toward better                               in Detroit
understanding this phenomenon and how to best handle it. First,
                                                                        This is really nothing more than a shorthand notation for
by creating a system that lowers the barrier of entry for the
                                                                        treating the first triple as the subject of a second triple, but it
creation of RDF, as well as a system for creating both
                                                                        proved to have an enormous impact on user success. (As an
statements and attributes, we can observe how uninitiated users
                                                                        example of the size of the effect, for one of the items 62% of
tend to differentiate between the two in practice. Later in this
                                                                        participants were able to correctly express the sentence using
paper we present findings that reveal user tendencies in
                                                                        predicate modifiers, compared with 14% using traditional
choosing between statements and attributes for specific types of
reification.) The overall conclusion is that end users can be far           relations, users will choose the latter significantly more
more successful in constructing n-ary relations when enabled to             often, and have more success in doing so.
employ predicate modifiers than when they are forced to express       •          H3B – The presence of the predicate modifier feature
them as reified triples.                                                    will have no significant negative impact on laypersons'
Guided by these findings, we implemented a predicate modifier               knowledge generation: i.e., it will rarely if ever be
scheme in OKM. An example is the “Leonardo da Vinci painted                 misapplied to produce errant knowledge.
Mona Lisa” fact in Figure 1. Note that “with: oils” is an
attribute, and “for Lisa del Giocondo” is a statement, and that       4.2       Participants
both are indented underneath the “painted” triple. Users create       Our participant group consisted of 71 college students ranging
such indented facts by pressing the “+Attribute” or                   from 18 to 22 years of age and contained roughly an even split
                                                                      between genders. All students were enrolled at the University of
“+Statement” buttons next to a triple, rather than at the top of
                                                                      Mary Washington during the Spring 2010 semester and were of
the role box (refer to Figure 2.) When generating Linked Data,        many diverse majors.
OKM converts these indented facts into traditionally reified
triples, so that the knowledge is compatible with all current         4.3       Procedure and Materials
Semantic Web tools. From a user interface perspective,                Participants took the one-hour experiment using the Firefox
however, users never see the complexities of reification: they        Internet browser on either a Windows or UNIX workstation. A
view and edit n-ary relations in terms of the much more intuitive     ten-minute demonstration and explanation of OKM was given,
predicate modifiers.                                                  and then each participant received an experiment packet and was
                                                                      directed to a URL (unique for each participant) that housed an
4.        EXPERIMENT                                                  OKM deployment with a pre-fabricated knowledge base
                                                                      containing about 130 resources and 150 predicates. The packet
4.1       Hypotheses                                                  included 10 questions to be answered using this knowledge base
We formulated        the   following   hypotheses    to   evaluate    (Part 1) and 24 facts to be added to it (Part 2). The final part of
experimentally.                                                       the packet (Part 3) was a survey to help us better analyze how
Regarding the “roles and templates” feature:                          the participants reacted to the system.

•          H1A – The addition of a “roles and templates” feature      Part 1 questions ranged from easy to difficult depending on how
                                                                      difficult it was to find the information in the system. Easy
      will significantly increase laypeople's ability to correctly
                                                                      questions were ones where the participant had to locate a
      formulate Linked Data: i.e., the RDF they generate will         specific resource page in the system and the answer was directly
      make more sense semantically.                                   on that page. For example, “How tall is Jason Thompson?” The
•           H1B – The addition of this feature will increase the      “Jason Thompson” resource existed in the system and the
      likelihood that laypeople will consistently formulate Linked    answer could be found on that page. More difficult questions
      Data: i.e., they will more often reuse appropriate predicates   forced the participant to view multiple pages and traverse links
      that already exist.                                             within the pages to locate the answer. For example, “What
                                                                      ballpark does Todd Helton’s baseball team play in?” The
•          H1C – Users will in general employ the role feature        participant had to first find the “Todd Helton” resource page,
      properly by selecting appropriate roles (and thus               and then find and click the link to the “Colorado Rockies” page
      incidentally contribute meaningful domain, range, and type      in order to find the name of the sports facility in which the team
      information.)                                                   played. Part 1 also acted as practice to help the participants
                                                                      become more comfortable and aware of the system and how it
•         H1D – Users will in general select roles consistently       was organized.
      with one another (i.e., if two users separately encode the
      same bit of knowledge, they are very likely to select the       Regarding hypothesis H3A, it is important to note that the last
                                                                      two Part 1 items involved n-ary relations, but that the pre-
      same role under which to create the triple.)
                                                                      fabricated knowledge base had encoded one of them using
Regarding the “elimination of attributes” feature:                    predicate modifiers, and the other using predicate reification.
                                                                      The two items had nearly identical structure: “For what novel
•          H2A – If given the choice of creating an attribute or a
                                                                      did Ernest Hemingway win the Pulitzer Prize?” and “For what
      statement for a given bit of knowledge, there will be no        film did Martin Scorcese win the Academy Award for Best
      predictable consensus among of a group of laypeople. They       Director?” Hence in answering this question, all participants
      will very often make inconsistent choices with one another,     witnessed properly encoded examples of both predicate
      leading to gross inconsistencies in a collaborative             reification and predicate modifiers. They were then presumably
      knowledge base.                                                 not biased in either direction when beginning Part 2, which
•          H2B – Laypeople who employ a Linked Data interface         required the encoding of five n-ary relations among its 24 items.
      that eliminates attributes altogether will suffer no            Part 2 had the participants add data to the knowledge base. This
      disadvantages: the data they generate will be as correct as     part had a range of difficulty levels just as Part 1 did. The facts
      those who have both statements and attributes available.        were presented as sentences with each sentence having one to
                                                                      three small facts within it. For example, “Madison Square
Regarding the “predicate modifiers” feature:                          Garden is located in New York City” has one fact: the fact that
•          H3A – Given examples of both predicate reification         Madison Square Garden is in New York City. “Mark David
      (traditional) and predicate modifiers (as described above),     Chapman assassinated John Lennon on December 8, 1980 at the
                                                                      Dakota Apartment Complex,” on the other hand, has three facts:
      and the choice to use either technique to express n-ary
                                                                      the fact that Mark David Chapman assassinated John Lennon,
and the date and the place of the assassination. (Note that this is   5.        RESULTS
an n-ary relation.) Resources referred to in part 2 did not always
exist in the pre-fabricated knowledge base, requiring the             5.1       Results: Roles and Templates
participant to create a resource before adding the fact.              5.1.1     The Effect of Templates
Our participants were split into four groups based on which           We evaluated our template-related hypotheses using nine
version of the program they used. The four versions were:             specific items. For each of these items, the system contained a
                                                                      predicate, associated with an appropriate role, that users should
ST (18 participants) – A “statements only” interface (i.e., no
                                                                      have noticed and could have selected using the template.
attributes) that provided role-based template information when
                                                                      However, these items were in two groups. For six of them
editing resources (as described above.) This is the version of the
                                                                      (items J, K, L, P, W, and X) the already existing predicate was
interface which we hypothesized would be the most effective,
                                                                      in fact semantically appropriate for the item. For instance, for
since it incorporated all three of the experimental features
                                                                      item J, “David Beckham scored 27 goals,” the pre-fabricated
described above.
                                                                      knowledge base contained the predicate “goals” for the “Soccer
S (19 participants) – A “statements only” interface with no           Player” role. Therefore, it would have been appropriate and
templates. This version was identical to ST, except that when in      consistent for ST users to select “goals” from the template to
edit mode, the greyed-out “suggested” predicates (such as             record this item (as opposed to creating their own equivalent
“dimensions,” “period,” and “influenced” in Figure 2) would not       predicate such as “scored” or “numberOfGoals.”)
appear.
                                                                      For the other three items, however, the already existing
SAT (18 participants) – A “statements + attributes” interface         predicate was not semantically appropriate for the item. We
with templates. This version was identical to ST, except that the     called these items “traps.” For example, the pre-existing
“+Attribute” buttons were included so that users could choose         predicate that ST users saw for item O, “The Matrix's gross
between creating attributes or statements. (This is the version of    earnings were $90 million,” was “net earnings.” We included
the interface depicted in Figures 1 and 2.)                           these three items because we wanted to measure the degree of
SA (16 participants) – Finally, in order to test hypothesis H2A, a    danger templates may introduce in leading users to choose pre-
number of participants received a more “traditional” version of       existing predicates that are in fact not appropriate.
OKM that permitted both statements and attributes, but provided       We evaluated templates in two ways. First (hypothesis H1A) we
no templates.                                                         compared the total correct responses between the ST and S
We then evaluated our hypotheses by judging the contents of the       groups for the nine items, treating each users' response for each
Linked Data knowledge bases that users produced while                 item as a seperate trial. Considering all nine, 70.37% of ST
carrying out the actions required in Part 2. We did this in the       users' representations (114 out of 162) were correct, as opposed
following way:                                                        to 67.25% (115 out of 171) of S users' representations. This
                                                                      difference was not statistically significant (p > 0.01 by Fisher's
•         H1A – compare groups S and ST for correctness.              exact test). This indicates that templates do not improve
•         H1B – compare groups S and ST on the items for              correctness of user-generated data, thereby refuting hypothesis
     which an appropriate predicate already existed in the pre-       H1A. (Interestingly, when considering only the six non-trap
     fabricated knowledge base, to determine whether they used        items, the ST group got 65.74% correct compared with S's
     that predicate.                                                  61.40%; for the trap items, ST got 79.63% and S 78.95%,
                                                                      neither of which was statistically significant. It appears, then,
•         H1C – judge all groups on how often the roles they
                                                                      that templates have no impact on correctness, regardless of
     chose to put a triple under was conceptually correct. This
                                                                      whether the predicates in question are in fact appropriate.)
     was admittedly somewhat subjective, but in practice there
     was very little debate among the graders (the four authors       We also studied the impact templates have on consistency of
     of this paper) as to whether a role was correct.                 data (H1B); i.e., the likelihood that data authors would re-use an
                                                                      appropriate predicate already existing in the system as opposed
•        H1D – for each item, evaluate the frequency with
                                                                      to creating a synonymous one. The effects of templates on the
     which participants chose the same role using Simpson's
                                                                      predicates used for the nine items are summarized in Table 1.
     diversity index[18].
                                                                            Table 1. Effect of role templates on predicate usage.
•         H2A – for participants in groups SAT and SA,
     evaluate the degree of consensus participants exhibited in                         ST                  S
     choosing attributes or statements to represent the                          Predicate used:     Predicate used:     P-value
     information.                                                           Item Existing   New     Existing     New
                                                                            J        7       11        0          19       0.0031
•         H2B – compare groups SAT and ST for correctness.
                                                                            K        10      8         0          19       0.0001
•         H3A – for the five Part 2 items requiring n-ary                   L        15      3         4          15       0.0002
     relations, count the number of times participants (in all
                                                                            P        1       17        0          19       0.4865
     groups) used predicate reification versus predicate
                                                                            W        5       13        1          18       0.0897
     modifers to correctly encode them.
                                                                            X        3       15        0          19       0.1050
•         H3B – for the nineteen Part 2 items that did not                  (Traps:)
     require n-ary relations, count the number of times
                                                                            C        18       0        19         0        1.0000
     participants mistakenly used the predicate modifiers feature
                                                                            O        12       6        19         0        0.0080
     and generated nonsensical Linked Data as a result.
                                                                            V        17       1        19         0        0.4865


                                                                      The results show that users in the ST group were significantly (p
                                                                      < 0.01 by Fisher's exact test) more likely to (correctly) use the
existing predicate in three of the six non-trap cases, and to          role, and items for which the user would have to add a role to
(incorrectly) use it in one of the three trap cases. This is a mixed   the resource in order to make a reasonable choice.
result. Evidently, templates effectively promote consistency for       The results show that users were very likely (92.81%) to make a
some facts but not for others, and they mislead users into             reasonable choice if the relevant objects already had an
semantic incorrectness for some facts but not for others.              appropriate role, but less likely (73.32%) to add one themselves
Hypothesis H1B appears to be confirmed only in certain cases.          (p < 0.05 by a t-test). The diversity of roles was high for both
When we probe the specific items to discover which kinds of            kinds of facts, although the second group was more diverse (p <
items templates assist, we discover that it greatly depends on         0.05). This indicates that users are often unable to choose
word choice. Templates were not shown to be helpful for items          correct roles, and are not reliably consistent with one another.
P (“The song 'Stairway to Heaven' featured Jimmy Page on               The fact that templates were successful at helping users enter
guitar”, with predicate “plays” defined for role “Musician”), W        semantically correct data despite these difficulties suggests that
(“John Entwistle played bass on the song 'Behind Blue Eyes'”,          users better guided by ontologies might experience greater
with predicate “plays” defined for role "Musician"), and X             benefits from a system that incorporates type and schema
(“Paul McCartney wrote the song 'Maybe I'm Amazed'”, with              information.
predicate “composed” defined for role “Musician”). The                       Table 2. Correctness and consistency of role choices.
wording of items P and X differs from the defined predicates,
while the tense differs for item W. These results suggest that                      Object Already Had Reasonable Role
users are less likely to use templates when it would require                      Item     % Correct # of Roles       D
restructuring the sentence or using different terminology.                          D        98.51%         4       0.19
                                                                                    E        76.81%         3       0.36
In a real-world setting, of course, users are not translating
                                                                                    H        95.59%         4       0.17
sentences into Linked Data, but “mental knowledge” into Linked
                                                                                    K        94.37%         4       0.13
Data. We can only speculate as to the size of this effect for
                                                                                    M       100.00%         1       0.00
mental knowledge, but it seems reasonable to assume that if a
                                                                                    N        88.57%        16       0.65
user wants to encode a fact, and has a certain phrasing in mind,
                                                                                    P        98.53%         3       0.31
they will succumb to the same pitfall that our testers did.
                                                                                    S        75.71%        16       0.67
Note that two requirements must be met in order for a user to                       T        88.73%         6       0.72
take advantage of a template: they must (1) select the proper                       U        95.59%         4       0.53
role (i.e., the role that is the domain for the predicate), and they                V        98.55%         3       0.18
must (2) observe and decide to use the relevant predicate for that                  W        97.06%         3       0.06
template. In cases where the user failed to use templates, neither                  X        98.57%         4       0.08
of these factors was entirely to blame. For items P and J, for                   Average     92.81%         5       0.31
example, 81.8% and 82.4% of the failures were due to choosing
the wrong role; for items W and X, on ther other hand, 100.0%                              User Forced to Add Role
and 86.7% were due to not using the right predicate within the                    Item      % Correct # of Roles          D
(correct) role. It appears possible to fail at either.                              F        44.93%          4           0.53
Item O (“The Matrix's gross earnings were $90 million”)                             G        69.23%          6           0.48
illustrates the potential negative impact of templates. Six out of                  J        89.55%          6           0.58
eighteen ST users chose to use the existing predicate “net                          K        92.31%          6           0.42
earnings.” This result indicates a risk that users may select                       L        66.20%          9           0.53
incorrect predicates when they are lexically similar but                            O        75.36%         13           0.47
semantically different from the phrases they intend to use.                         Q        77.14%          9           0.43
However, this risk can be expected to diminish with substantial                     R        71.83%          4           0.53
domain knowledge, which equips users to correctly differentiate                  Average     73.32%          7           0.50
between similar terms.
Overall, these findings suggest that templates assist with both        5.2       Results: Elimination of Attributes
consistency and correctness in predicate usage, despite not being      To determine whether the presence of attributes in the system
effective in all cases. Templates are helpful when the user            influences users' ability to represent data, each item was rated
selects the appropriate role and when the existing predicate is        for correctness. A response was considered correct if it
consistent with the user's intended phrasing of the information.       accurately conveyed the information given in the text, was
                                                                       consistent with the graph-based data model, and was associated
5.1.2     The Viability of Roles                                       with a reasonable role. Table 3 compares the SAT and ST
In order to evaluate the viability of roles, we examine both           groups for overall correctness of data entry.
correctness (H1C) and consistency (H1D) of roles chosen for 20
items. We judged correctness by the relevance of the chosen            The results show no significant difference between the two
roles to the information entered. For example, we considered           groups (by Fisher's exact test, α = .01). (It is also the case that
appropriate roles for item L – “Deion Sanders has stolen 35            no significant difference existed on any one item.) This finding
bases” – to include “Baseball Player” and “Athlete” but not            is consistent with our hypothesis that the presence of attributes
“Football Player” or “Person.” Inconsistency is measured using         in the system would not influence the quality of data produced
Simpson's Index of Diversity (D = 1 - Σ(pi2), where pi is the          by users. Thus hypothesis H2B is confirmed.
proportion of users who chose role i) for each item. An index of       We also hypothesized that, when forced to choose whether to
0 indicates that all users chose the same role. Larger values          model a fact as a statement or as an attribute, users would
indicate more distinct roles chosen and a more even distribution       behave inconsistently with one another. Table 4 shows the data
between roles. Both are shown in Table 2, divided into two             we collected to evaluate this hypothesis. The 24 sentences
groups: items for which the resource already had an appropriate
contained 30 atomic facts, which are divided into five categories   5.3       Results: Predicate Modifiers
based on whether their objects are: proper nouns (14 facts, such    To evaluate the effectiveness of predicate modifiers as a tool for
as “New York City”), common nouns (5 facts, such as “piano”),       expressing n-ary relations, we exposed users to both predicate
numeric values (9 facts, such as “186 pounds”), dates (3 facts,     modifiers and the traditional method of predicate reification.
such as “April 4, 2008”), and years (1 fact, “2004”).               None of the 71 users employed predicate reification in
        Table 3. Impact of attributes on correctness.               representations of any of the five facts containing n-ary
                                                                    relations. Table 5 shows the overall correctness of the user's
         Group     Correct     Incorrect   % Correct                representations of those facts (using predicate modifiers).
          SAT       336            96       77.78%                     Table 5. Use of predicate modifiers for n-ary relations.
           ST       353            79       81.71%
                             p = .18                                                            Correct responses
                                                                                 Item              (out of 71)
                                                                                   I                 60.56%
 Table 4. Consistency of the attributes vs statements choice
                                                                                  N                    66.20%
               among various types of data.
                                                                                  P                    50.70%
          Category     Number of items % attributes                               S                    56.34%
       Common nouns             5             27.10%                              W                    49.30%
       Date                     3             56.50%                           Average                 56.62%
       Numeric                  7             74.30%
       Proper nouns            14             16.20%
                                                                    Users were less likely to express these facts correctly than
       Year                     1             20.70%                simpler items. However, in light of our previous study[8] that
                                                                    showed that people with minimal training are extremely unlikely
Our users demonstrated the highest consistency for proper nouns     to properly express n-ary relations using triples, the results are
                                                                    promising. Hypothesis H3A is soundly confirmed.
and the lowest for numeric values. However, it is clear that for
none of the five types can consistency be counted on. No matter     For the vast majority of test items, which contained only a
what type of fact is being represented, different novice users      single fact each, users did not attempt to (incorrectly) represent
will encode it in different ways – some as attributes, some as      them using predicate modifiers. (No more than 2 out of 71 users
                                                                    tried this for any of those items.) However, for item K (“Peyton
statements – leading to basic inconsistencies in the resulting
                                                                    Manning passed for 206 yards, while Brett Favre threw for
structure of the Linked Data.
                                                                    315”) this did prove to be a common problem. This item
Evaluating consistency in the abstract is difficult, but we note    actually contains two separate binary relations, but 25 out of 71
certain items that show a surprising lack of consensus. For         total users (35.2%) incorrectly applied predicate modifiers to
instance:                                                           try and express it. This finding may suggest that novice users
                                                                    can have difficulty determining whether a complex thought
  Item F - “Kelly Witt is a freshman.” (14 attrs, 18 stmnts)        represents a single n-ary relation, or a series of binary relations.
  Item S - “Michael Abram stabbed George Harrison on Dec.           If so, we argue that this only emphasizes the need to investigate
  30th, 1999” (the date: 19 attrs, 12 stmnts)                       more intuitive techniques for representing complex information.
                                                                    In any case, for simple sentences, hypothesis H3B is confirmed.
  Item Q - “Deion Sanders hit 7 home runs” (24 attrs, 9
  stmnts)                                                           5.4       Results: survey
Users exhibit no strong consensus regarding how best to model       Finally, our experiment ended with a 12-question survey in
                                                                    which participants answered reaction questions on a 6-point
these pieces of information, and many others, confirming
                                                                    Likert scale. These measured user satisfaction with the system,
hypothesis H2A.                                                     the ease with which they could locate information, etc. Only two
For some items, a substantial number of users made very             of the items demonstrated any significance between groups (to
unintuitive choices. Item E, clearly a numeric value (“Ryan         an α of 0.05):
Medina's GPA is 2.79”) was represented as a statement 11 out of         “It was easy to use the system to add new information.”
33 times (33.3%). Even more problematic is the tendency to              The average responses on this item were: Group S=4.9,
represent proper nouns as attributes. For item A (“Madison              Group ST=5.5, Group SAT=4.1. Considering “templates”
Square Garden is located in New York City”) 9 out of 30 users           and “attributes” to be two independent variables, a
chose to represent New York City as an attribute (30.0%) And            univariate ANOVA test confirms that a “statements only”
for item N (“Mark David Chapman assassinated John Lennon on             interface has a beneficial effect on user perception of how
December 8, 1980 at the Dakota Apartment Complex”), 17 out              easy it is to add data. (p < 0.05).
of 31 users represented Dakota Apartment Complex as an                  “I was confident that I added the information correctly.”
attribute (54.8%). We argue that representing proper nouns like         The average responses were: Group S=3.8 Group ST=4.6,
“New York City,” about which many things on the Semantic                Group SAT=3.9. The ANOVA test confirms that the
Web are likely to be said, as anything other than resources is a        template feature has a beneficial effect on user
mistake, and that untrained users are likely to make that mistake       confidence in adding data. (p < 0.05)
often when given the choice. In the absence of a compelling         (Note that since we had no survey information from an “SA”
reason to do so, we recommend that systems not force users to       group – i.e., attributes, but no templates – we could not detect
make the choice between resources and literals.                     any possible interaction between variables.)
Although we had no a priori hypotheses regarding user                 [7] Campanini, S., Castagna, P., and Tazzoli, R. “Platypus Wiki:
preferences, this survey information seems significant. Users         a Semantic Wiki Wiki Web,” Proceedings of the 1st Italian
appear to have a preference for a “statements only” interface         Semantic Web Workshop Semantic Web Applications and
with templates. This type of interface modestly enchances the         Perspectives (SWAP), (Ancona, Italy), 2004, pp. 1-6.
user experience and raises confidence.                                [8] Davies, S., Zeitz, J., and Hatfield, J. “Addressing the
                                                                      Cognitive Difficulties of Expressing N-ary Relations in
6.        CONCLUSIONS                                                 Semantic Web Data .” (Manuscript submitted for publication.)
Novice end users, who are potential contributors to the Web of
Linked Data, have substantial difficulties formulating                [9] Kalyanpur, A., Parsia, B., Sirin, E., Grau, B., and Hendler, J.
knowledge in the format the Semantic Web requires. User               “Swoop: A web ontology editing browser,” Web Semantics:
interface design, therefore, is paramount. Our empirical testing      Science, Services and Agents on the World Wide Web, vol. 4,
has shed light on certain aspects of how such interfaces are used     pp. 144-153, 2006.
in practice, and should be best designed. These include:              [10] Kintsch, K. Comprehension: A Paradigm for Cognition,
1.   Requiring users to group information about a resource            Cambridge University Press, 1998.
     according its roles (types), and displaying a template of        [11] Krötzsch, M., Vrandecic, D., and Völkel, M. “Semantic
     previously used predicates for each of those types, can help     MediaWiki,” Proceedings of the 5th International Semantic
     channel users towards the re-use of predicates that already      Web Conference (ISWC06), Springer, pp. 935–942, 2006.
     exist, avoiding undesirable proliferation of synonyms. We
     also believe that roles and templates provide benefits in        [12] Luczak-Rosch, M. and Heese, R. “Linked Data Authoring
     terms of more facile navigation and the implicit creation of     for Non-Experts.”Proceedings of the 2009 Linked Data on the
     domain, range, and type information. However, users are          Web Workshop (LDOW09), 2009.
     not consistent in their choice of roles, suggesting that some    [13] McBride, B. “Jena: A semantic web toolkit,” IEEE Internet
     mechanism for encouraging consistency would be wise.             Computing, vol. 6, pp. 55-59, 2002.
2.   Users are very inconsistent in choosing to model an object       [14] Mitchell, T., Betteridge, J., Carlson, A., Hruschka, E., and
     as either a resource or a literal, and this appears to be true   Wang, R. “Populating the Semantic Web by Macro-reading
     across a wide variety of types of objects. In many cases         Internet Text,” Proceedings of the 8th International Semantic
     they simply make inappropriate choices. Moreover, users          Web Conference (ISWC09), Chantilly, VA: Springer-Verlag, pp.
     appear to be just as successful generating Linked Data           998-1002, 2009.
     when they use an interface that only supports resources. We
                                                                      [15] Noy, N., Grosso, W., and Musen, M. “Knowledge-
     therefore recommend that tools for authoring Linked Data
                                                                      acquisition interfaces for domain experts: An empirical
     not include literals in the interface.
                                                                      evaluation of Protege-2000,” Proceedings of the 12th Internal
3.   A scheme for allowing users to express n-ary relations with      Conference on Software and Knowledge Engineering, Chicago,
     modified predicates, rather than with traditional predicate      USA, July, 5-7, 2000.
     reification, can enormously increase their success in
                                                                      [16] Pietriga, E. “Isaviz: a visual environment for browsing and
     modeling such information.
                                                                      authoring rdf models,” Proceedings of the Eleventh
7.        ACKNOWLEDGEMENTS                                            International World Wide Web Conference, 2002.
We would like to thank Trillane Burlar and Christopher (Shane)        [17] Schaffert, S., Eder, J., Grünwald, S., Kurz, T., Radulescu,
Voisard for their inimitable development work, without which          M., Sint, R., and Stroka, S. “KiWi–A Platform for Semantic
this project would not have been possible.                            Social Software.” Proceedings of the Workshop on Semantic
                                                                      Wikis, in conjunction with The 6th European Semantic Web
8.        REFERENCES                                                  Conference (ESWC09), 2009.
[1] Anderson, J. Cognitive Psychology and its Implications            [18] Simpson, E. “Measurement of diversity,” Nature, vol. 163,
(Seventh Edition), Worth Publishers, 2009.                            1949, p. 688.
[2] Auer, S., Dietzold, S., and Riechert, T. “OntoWiki-A Tool         [19] Sowa, J. Conceptual structures: information processing in
for Social, Semantic Collaboration,” Lecture Notes in Computer        mind and machine, Reading, MA: 1984.
Science, vol. 4273, pp. 736-749, 2006.
                                                                      [20] Staab, S., Maedche, A. and Handschuh, S. “Creating
[3] Berners-Lee, T. “Linked Data – Design Issues,” 2006.              metadata for the semantic web: An annotation framework and
Available at: http://www.w3.org/DesignIssues/LinkedData.html.         the human factor,” Technical Report 412, Institute AIFB,
[4] Berners-Lee, T., Hollenbach, J., Lu, K., and Presbrey, J.         University of Karlsruhe, 2001, 2001.
“Tabulator Redux: Writing Into the Semantic Web,”                     [21] Suchanek, F., Kasneci, G., and Weikum, G., “Yago: a core
Proceedings of the 2008 Linked Data on the Web Workshop               of semantic knowledge,” Proceedings of the 16th international
(LDOW08), 2008.                                                       conference on the World Wide Web, p. 706-715, 2007.
[5] Bollacker, K., Evans, C., Paritosh, P., Sturge, T., and           [22] Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., and
Taylor, J. “Freebase: a collaboratively created graph database        Studer, R. “Semantic Wikipedia,” Proceedings of the 15th
for structuring human knowledge,” Proceedings of the 2008             international conference on the World Wide Web, Edinburgh,
ACM SIGMOD International Conference on Management of                  Scotland: ACM, pp. 585-594, 2006.
Data, Vancouver, Canada: ACM, 2008, pp. 1247-1250.
                                                                      [23] W.A. Woods, “What's in a link: Foundations for semantic
[6] Bollacker, K. Tufts, P., Pierce, T., and Cook, R. “A Platform     networks,” in D. G. Bobrow and A. Collins, eds, Representation
for Scalable, Collaborative, Structured Information Integration,”     and Understanding. Academic Press, New York, 1975.
Proceedings of the 6th International Workshop on Information
Integration on the Web, Vancouver, Canada, 2007.