User Interface Design Considerations for Linked Data Authoring Environments Stephen Davies, Jesse Hatfield, Chris Donaher, Jessica Zeitz University of Mary Washington 1301 College Ave Fredericksburg, VA 22401 1­540­654­1317 {sdavies, jhatfiel, cdonaher, jzeitz}@umw.edu ABSTRACT about novices' ability to generate Linked Data in the format If non-technical end users are to contribute to the Web of Data required by the Semantic Web. Formal knowledge as they have to the Web of Documents, they must employ tools representation is difficult and error-prone for most non-technical that enable them to do so. This challenge is not easy to meet, as people. It is a very different activity from writing in natural formal knowledge representation is a daunting task for the language, which is the way that most laypeople have contributed uninitiated. Indeed, we have empirically observed that to the Web to date. Authoring Linked Data demands an expressing anything but the most straightforward of facts in unswervingly consistent naming scheme, an unprecedented level RDF-compatible format is extremely difficult for newcomers to of exactitude, fluency with a new suite of concepts, and an do reliably. adherence to a set of rigid and (to the layperson) seemingly This paper reports on a controlled experiment in which novices arbitrary rules that run counter to the way most people think, let attempted to use a prototype Linked Data interface to both find alone converse. Though some psychologists (e.g., [1,10,19]) and encode bits of everyday knowledge. The application have thought semantic networks to be reflective of the way presents a user-friendly veneer to the Semantic Web, human memories are encoded, one only has to watch a novice manifesting the essential graph-based nature of the data model struggle with expressing even basic concepts in a graph-based while shielding the user from the complexity of syntax. This knowledge structure to know that this activity is extremely allows us to study user behavior in attacking the deep, cognitive challenging. problem: breaking down knowledge into the triple-based structure required by RDF Linked Data. Our study sheds light on We believe that for non-specialists to be successful in some of the key aspects of knowledge formulation that novices contributing to the Web of Data, they must use tools designed to struggle with, and suggests several specific design approaches compensate for their weaknesses. The design of such tools for Linked Data authoring environments that our experiment should be informed by empirical studies that illuminate how makes clear beneficially address crucial issues. target users actually go about generating Linked Data, so that strengths can be maximized, weaknesses complemented, and unfruitful trends redirected. Categories and Subject Descriptors H.5.2 [User Interfaces]: Interaction styles; H.5.4 [Hypertext/ The immediate goal of the work presented in this paper is not so Hypermedia]: User issues. much to design the ultimate Linked Data authoring environment as to empirically verify which aspects of such environments might be beneficial or harmful. By studying user behavior under General Terms simulated conditions, and observing which specific aspects of Semantic Web, Linked Data, User Interface Design, the Linked Data authoring process prove to be obstacles, we Experimentation, Human Factors. illuminate the nature of the problem and offer experimentally driven guidance on how to make end users successful. 1. INTRODUCTION The remainder of this paper is organized as follows. First, we A successful, global-scale Semantic Web presupposes large describe related work in user studies of knowledge formulation amounts of instance data available for machines to process. As processes and tools. Then, we introduce OKM1, the prototype Tom Mitchell summarized during his ISWC 2009 keynote Linked Data authoring tool used in our experiments, address[14] there are essentially three ways to produce this: (1) highlighting key features whose viability we focused on in our humans entering structured information, (2) database owners study. We then describe the nature of our usability experiment, publishing their data in RDF format, and (3) employing and present and interpret a quantitative analysis of the results. automated natural langauge processing techniques to “read” Finally, we summarize our findings and make generalizations unstructured Web data. and recommendations for future interfaces to Linked Data applications. One might suppose that the only major impediment to (1) is convincing the masses that they have an incentive to do this. But in addition to the issue of motivation, serious questions arise 1 OKM is a recursive acryonym which stands for “OKM Copyright is held by the author/owner(s). Knowledge Management,” and is pronounced as “Occam.” LDOW2010, April 27, 2010, Raleigh, USA. The prototype application is open-source and publicly accessible at http://sourceforge.net/projects/okm. 2. RELATED WORK alternative ways to express n-ary relations. None of these UI A wide array of tools have appeared in the last several years to aspects has, to our knowledge, been empirically studied in a help users in the RDF generation process. These include focused, experimental setting. everything from semantic wikis (e.g., Platypus[7], Semantic Mediawiki[11], IkeWiki[17]) to semantic annotation tools (e.g., 3. OKM FEATURES Loomp[12], OntoAnnotate[20]) to RDF editors (e.g., 3.1 Basic Design OntoWiki[2], Tabulator[4], IsaViz[16]) to full-blown ontology OKM’s primary purpose is to serve as a testing bed for management environments (e.g., Protege[15], Swoop[9]). With analyzing how laypeople interact with Linked Data tools, and its few exceptions, however, published reports on these tools have basic design is common to many state-of-the-art RDF and not included usability studies to evaluate their effectiveness, or ontology editors. This commonality is key in relating OKM to to identify the cognitive barriers users may face when using tools currently in use by the Semantic Web community; with it, them. The result is a body of literature that contains many we hope to generalize the results we obtain from empirical innovative and potentially useful user interface ideas, but with testing to Linked Data authoring as a whole. no core set of principles whose effectiveness has been proven For instance, like OntoWiki[2], Tabulator[4], Kiwi[17], and which can guide further work. Semantic Wikipedia[22], and many other tools, OKM's pages We mention here two notable efforts which did include are “resource-centric” in that each page represents a single illuminating usability studies. One was conducted by Staab et resource, displaying all the properties relating to that resource. al.[20], who performed an in-depth analysis of the behavior of Hyperlinks to related resources can be used to traverse the site. nine experimental subjects who used the OntoAnnotate semantic As with Freebase[5], users primarily interact with the system in annotation tool. Their primary measure was inter-annotator terms of human-readable names (HRNs) rather than full URIs. agreement; that is, the degree to which different participants At resource creation time, OKM auto-generates a globally- independently annotated a page in the same way. Their unique URI for that resource (scoped to the domain name of the conclusion, roughly speaking, was that novices to the Semantic OKM server), but users continue to work with HRNs in order to Web, operating in a domain where they are not experts, will not diminish screen clutter and enable more focus on semantics than in general produce high-quality structured knowledge, or at least syntax. not knowledge that agrees with one another. If nothing else, this Users can add datatype or object properties to a resource directly confirms the difficulty of the problem laypeople face. from its page. In the interface, OKM refers to datatype Noy, et al.[15], on the other hand, performed an experiment in properties (whose values are literals) as “attributes” and object which military domain experts used a version of Protege-2000 properties (whose values are resources) as “statements.” 2 (We with domain-specific extensions in order to perform specific will use this terminology throughout the remainder of this knowledge acquisition tasks. The structure of the knowledge paper.) The use of two terms (instead of calling everything a base given to participants was very detailed, and comprised a “triple”) is intended to help the user better appreciate the precisely specified class hierarchy containing concepts (e.g., distinction between them, since they are created, presented, and types of combat units) that participants used on a daily basis. navigated differently. If the user chooses to add an “attribute,” Unlike Staab et al.'s, Noy et al.'s conclusion was optimistic: the property value will be interpreted as a primitive data type. If these domain experts, with 1-2 hours of training but no the user chooses to add a “statement,” the property value will be computer science background, were in fact able to effectively interpreted as the HRN of another resource. For statements, the use a large knowledge base that concerned a domain with which user can specify an existing resource in the system as the object they were intimately familiar. The contrast between these two – at which point the new resource is effectively “stitched in” to studies' outcomes testifies to the impact that domain expertise the rest of the graph – or else refer to a resource which does not and domain-specific tools can have. The subjects in Staab et yet exist, which will implicitly create that resource. al.'s study, who used a general tool on general subject matter, Users can also search the system for resources by typing in a had much greater difficulty. Clearly the more challenging user search box that autocompletes based on HRNs, or any portion interface problem is to equip novices with a tool that is not thereof (e.g., typing “lin” will match a resource whose HRN is custom-tailored to any particular subject matter, but which Abraham Lincoln.) This functionality is of course common to facilitates the proper construction of valid Linked Data on any innumerable tools today, from Freebase[5] to IsaViz[16] to non- topic, even one in which users do not begin with expert-level semantic-web tools like Wikipedia and the Google search conceptions. interface. Also, an explicit “create” box allows resources to be created from scratch, and not (initially) connected to anything. The setting we explore is more reminiscent of Staab et al.'s study, since we are focusing on laypeople (not domain experts) Again, since this design is similar in spirit to that of many tools who are tasked with formulating generalized, open-ended in existence today, we believe that empirical findings based on knowledge. Our work differs from each of these efforts in that OKM's interface will be of broad interest to the community of we are examining the effect and usage of specific user interface Linked Data researchers studying user interfaces. features, with the goal of discovering how a general Linked Data editor would best be designed. In particular, we analyze user behavior in choosing resources versus literals to represent 2 information, the efficacy of employing types and templates in We chose these words based on survey feedback from a the interface to steer users towards data consistency, and previous experiment[8] in which users were asked for the most intuitive terms for the two concepts. Figure 1. The basic OKM interface, in "view" mode. (SAT version.) 3.2 Linked Data Publishing In this way, Semantic Web amateurs can be empowered to OKM stores all information that the user creates in a local Jena contribute to the Linked Data movement by utilizing a tool with RDF triple store[13]. Appearing in the upper corner of every a low barrier to entry and which shields them from the syntactic OKM page is a “Publish” link which, if pressed, will generate complexities of RDF. Note that the current version of OKM Linked Data for the currently-displayed resource in RDF/XML does not support “round-trip” knowledge creation whereby format. This Linked Data is stored in a file in a configurable existing Linked Data (and ontologies) can be imported into the location on the web server that is hosting the OKM installation. tool. This feature was postponed since it did not bear upon our It can then be accessed over the Web by dereferencing the URI immediate experimental concern; in future studies, however, we that OKM auto-generated for the resource, according to Linked plan to implement this and study user behavior in interacting Data principles.[3] Note that the RDF/XML file will contain a with a larger, pre-existing knowledge space (in which there is serialization of (1) all triples for which the currently-displayed greater urgency to find and re-us existing resources.) resource is a subject, and (2) rdfs:seeAlso links for the URIs of 3.3 Experimental Features resources that appear as the subject of a triple for which the Supplementing this normative user interface are three atypical currently-displayed resource is an object. features, which formed the focus for most of the investigative If the user presses the “Publish” link while viewing the OKM effort described in this paper. We hypothesized that each of home page, Linked Data for all resources in the local system these changes to the pseudo-standard user interaction paradigm will be generated. The entire knowledge base will thus be would prove beneficial to novices attempting to interact with globally exposed to Linked Data consumers. Semantic Web data, and for different reasons. 3.3.1 Roles and Templates and a value. An autocomplete function assists the user with both Rather than presenting all properties of a given resource in one inputs, offering to match predicates already in the system, and long display, OKM encourages – and in fact, mandates – (in the case of statements) HRNs of resources already in the organization of these properties according to the resource's system. It is perfectly permissible, however, for the user to type “roles.” A role is essentially an rdfs:Class to which the resource the name of a new predicate and/or the name of a new resource, belongs, and which acts as the rdfs:domain (or rdfs:range) of the in which case the new item is implicitly created. The new properties relevant to that class. Consider the screenshot in predicate is automatically given a domain based on the role box Figure 1. Here, the “Leonardo da Vinci” resource (which of it was added to, and a range based on the role of the object value course has a unique URI but which is presented to the user in it was given. (For object resources with multiple roles, the “Set terms of its HRN, as described above) has three roles: Artist, Role” button can be used to select which of the resource's roles Person, and Scientist. Each role is manifested as its own box, should be the range of the predicate.) From that point forward, with the relevant statements as contents. A given triple about the system incorporates the new predicate into its ever-evolving Leonardo Da Vinci will appear in the role box which represents schema. the domain for that triple (or, if Leonardo Da Vinci is the object One important aspect of roles is that when in edit mode, a rather than the subject of the triple, in the role box which template appears within each role box that displays the represents the range. The “Giorgio Vasari” triple is an example predicates already known to have that role as a domain. Using of this latter case.) these templates is similar to inserting data in Freebase's type- In order to add an RDF triple to the system, the user must based editing model [6]. In Figure 2, note the predicates choose one of the resource's roles (or add a new role) which will “dimensions,” “period,” and “influenced” which appear in grey. serve as the domain of the triple, and then add the triple in the These predicates – which are absent when the resource is being corresponding role box. The user begins this process by clicking seen in “view mode” – appear in the box because at least one on the “Edit” link at the top of the page, thereby putting the other resource with the “Painting” role has a triple involving page in “edit mode.” (See Figure 2 for an example.) The role each of these predicates. Pressing the “Add Value” button next boxes then acquire buttons labeled “+Attribute” and to a grey item will prompt the user for a value for that item. In “+Statement,” which can be used to add attributes or statements this way, the template suggests to the user possible predicates to that role box. The user can then type the name of a predicate that are consistent with the schema that exists thus far. Figure 2. The basic OKM interface, in “edit” mode. (SAT version.) In any fairly complex knowledge base, we can predict that most information, and an analysis of the degree of consistency users will be unable to keep track of all the predicates in use and laypeople exhibit in this choice. will inevitably use different predicates to represent the same Second, we explore the effects of an RDF editor in which semantic concept. OKM's templates are designed to help guide attributes are simply eliminated. It is possible, of course, to users into editing resources in such a way that they stay within completely do away with the concept of literals if one is the current schema, while not constraining users from adding to prepared to accept elements like “175.4” as resources. This is that schema. one way of dispensing with both the angst users face in making In summary, then, roles are intended to provide three benefits: the decision, and also the inconsistency that can result when 1. They lend organization to the display when a resource users make different choices: simply take away the choice has many triples, in order to make information easier to altogether. This may seem like a heavy-handed solution, but it is find and enter. not without theoretical merit. Consider that more than one prominent cognitive psychologist (e.g., [1, pp.125-7; 10, pp.34- 2. They ensure richer data (with domain, range, and type 92; 23]) has formulated a knowledge representation theory based information) than novice users would ordinarily produce. It on something akin to semantic networks, yet found no need to seems likely that when authoring only simple triples, most differentiate between resouces and literals. One kind of node is novice users would not bother to assign types to their all that comprises these knowledge structures, which suggests resources, nor domains and ranges to their predicates. (At that a “resources only” network is in fact sufficient to encode the least, it is unlikely that they would consistently do this.) human knowledge. And it places the burden of proof rather on With OKM, however, the act of assigning types, domains, those who argue for the existence of two distinct kinds of and ranges is built in to the very process of creating triples, entities. making it convenient to do and impossible to avoid. As described below, we deployed to experimental subjects not 3. They provide a template of relevant predicates that is only the version of OKM depicted in Figures 1 and 2, but also easy for users to fill out. This provides not only versions in which attributes were completely eliminated. The instantaneous ease of use, but promotes long-term data “+Attribute” button was removed from all displays, which consistency. effectively forced users to model everything as statements. We We present benefits 1 and 2 without proof. Later in this paper, then compared accuracy, consistency, and user satisfaction we provide an in-depth empirical analysis to judge the efficacy between the different versions. of benefit 3. 3.3.3 Predicate Modifiers 3.3.2 The Elimination of Attributes (Literals) Lastly, OKM allows users to construct n-ary relations without The flexibility that RDF offers in supporting both resources and explicitly using reification. This feature was inspired by a recent literals as object values is a mixed blessing. On the one hand, it project in which we conducted a pencil-and-paper based presents an expressive modeling device. The objects of the experiment[8]. In this study, young adults with no previous triples “John marriedTo Sally” and “John weightInPounds Semantic Web experience were asked to construct knowledge 175.4” seem inherently different: “Sally” is presumably a bona representations (both visually and textually) corresponding to fide resource in her own right, with other triples expressing English sentences. Some of these sentences contained facts information about her, whereas “175.4” intuitively seems like a which were inherently n-ary: “Muhammad Ali fought Joe primitive piece of raw data, undeserving of resource status. Frazier in Detroit,” for example. (This statement relates three Allowing authors to designate an object as one or the other entities and hence cannot be expressed as simple subject- affords the opportunity to express this subtle aspect of the predicate-object triples without reifying the verb.) Our object. participant pool was divided so that half of them were shown solutions to such sentences using traditional reification On the other hand, the existence of the distinction means that techniques: first, create a resource representing the verb authors are forced to choose between the two alternatives, and (“AliFrazierDetroitFight,” perhaps) and then attach the other the choice is not always easy to make. Consider triples like resources to it with predicates like “participant” and “location.” “BeverlysToyota color red,” “Charlie bornIn 1982,” and The other half of the participant pool was instead shown “Candice schoolYear sophomore.” The object values “red,” solutions involving predicate modifiers: they were permitted to “1982,” and “sophomore” might be considered literal pieces of break outside the strict triple scheme and augment a triple with data, as with the above weight example, or as first-class further information indented beneath it. (This is the scheme resources. Anyone who has composed RDF for any length of supported by the Yago knowledge model[21].) To illustrate, time knows that this choice presents itself at every turn, and that users could express the above sentence textually as: in some cases it feels almost arbitrary. MuhummadAli fought JoeFrazier Our work presents two contributions toward better in Detroit understanding this phenomenon and how to best handle it. First, This is really nothing more than a shorthand notation for by creating a system that lowers the barrier of entry for the treating the first triple as the subject of a second triple, but it creation of RDF, as well as a system for creating both proved to have an enormous impact on user success. (As an statements and attributes, we can observe how uninitiated users example of the size of the effect, for one of the items 62% of tend to differentiate between the two in practice. Later in this participants were able to correctly express the sentence using paper we present findings that reveal user tendencies in predicate modifiers, compared with 14% using traditional choosing between statements and attributes for specific types of reification.) The overall conclusion is that end users can be far relations, users will choose the latter significantly more more successful in constructing n-ary relations when enabled to often, and have more success in doing so. employ predicate modifiers than when they are forced to express • H3B – The presence of the predicate modifier feature them as reified triples. will have no significant negative impact on laypersons' Guided by these findings, we implemented a predicate modifier knowledge generation: i.e., it will rarely if ever be scheme in OKM. An example is the “Leonardo da Vinci painted misapplied to produce errant knowledge. Mona Lisa” fact in Figure 1. Note that “with: oils” is an attribute, and “for Lisa del Giocondo” is a statement, and that 4.2 Participants both are indented underneath the “painted” triple. Users create Our participant group consisted of 71 college students ranging such indented facts by pressing the “+Attribute” or from 18 to 22 years of age and contained roughly an even split between genders. All students were enrolled at the University of “+Statement” buttons next to a triple, rather than at the top of Mary Washington during the Spring 2010 semester and were of the role box (refer to Figure 2.) When generating Linked Data, many diverse majors. OKM converts these indented facts into traditionally reified triples, so that the knowledge is compatible with all current 4.3 Procedure and Materials Semantic Web tools. From a user interface perspective, Participants took the one-hour experiment using the Firefox however, users never see the complexities of reification: they Internet browser on either a Windows or UNIX workstation. A view and edit n-ary relations in terms of the much more intuitive ten-minute demonstration and explanation of OKM was given, predicate modifiers. and then each participant received an experiment packet and was directed to a URL (unique for each participant) that housed an 4. EXPERIMENT OKM deployment with a pre-fabricated knowledge base containing about 130 resources and 150 predicates. The packet 4.1 Hypotheses included 10 questions to be answered using this knowledge base We formulated the following hypotheses to evaluate (Part 1) and 24 facts to be added to it (Part 2). The final part of experimentally. the packet (Part 3) was a survey to help us better analyze how Regarding the “roles and templates” feature: the participants reacted to the system. • H1A – The addition of a “roles and templates” feature Part 1 questions ranged from easy to difficult depending on how difficult it was to find the information in the system. Easy will significantly increase laypeople's ability to correctly questions were ones where the participant had to locate a formulate Linked Data: i.e., the RDF they generate will specific resource page in the system and the answer was directly make more sense semantically. on that page. For example, “How tall is Jason Thompson?” The • H1B – The addition of this feature will increase the “Jason Thompson” resource existed in the system and the likelihood that laypeople will consistently formulate Linked answer could be found on that page. More difficult questions Data: i.e., they will more often reuse appropriate predicates forced the participant to view multiple pages and traverse links that already exist. within the pages to locate the answer. For example, “What ballpark does Todd Helton’s baseball team play in?” The • H1C – Users will in general employ the role feature participant had to first find the “Todd Helton” resource page, properly by selecting appropriate roles (and thus and then find and click the link to the “Colorado Rockies” page incidentally contribute meaningful domain, range, and type in order to find the name of the sports facility in which the team information.) played. Part 1 also acted as practice to help the participants become more comfortable and aware of the system and how it • H1D – Users will in general select roles consistently was organized. with one another (i.e., if two users separately encode the same bit of knowledge, they are very likely to select the Regarding hypothesis H3A, it is important to note that the last two Part 1 items involved n-ary relations, but that the pre- same role under which to create the triple.) fabricated knowledge base had encoded one of them using Regarding the “elimination of attributes” feature: predicate modifiers, and the other using predicate reification. The two items had nearly identical structure: “For what novel • H2A – If given the choice of creating an attribute or a did Ernest Hemingway win the Pulitzer Prize?” and “For what statement for a given bit of knowledge, there will be no film did Martin Scorcese win the Academy Award for Best predictable consensus among of a group of laypeople. They Director?” Hence in answering this question, all participants will very often make inconsistent choices with one another, witnessed properly encoded examples of both predicate leading to gross inconsistencies in a collaborative reification and predicate modifiers. They were then presumably knowledge base. not biased in either direction when beginning Part 2, which • H2B – Laypeople who employ a Linked Data interface required the encoding of five n-ary relations among its 24 items. that eliminates attributes altogether will suffer no Part 2 had the participants add data to the knowledge base. This disadvantages: the data they generate will be as correct as part had a range of difficulty levels just as Part 1 did. The facts those who have both statements and attributes available. were presented as sentences with each sentence having one to three small facts within it. For example, “Madison Square Regarding the “predicate modifiers” feature: Garden is located in New York City” has one fact: the fact that • H3A – Given examples of both predicate reification Madison Square Garden is in New York City. “Mark David (traditional) and predicate modifiers (as described above), Chapman assassinated John Lennon on December 8, 1980 at the Dakota Apartment Complex,” on the other hand, has three facts: and the choice to use either technique to express n-ary the fact that Mark David Chapman assassinated John Lennon, and the date and the place of the assassination. (Note that this is 5. RESULTS an n-ary relation.) Resources referred to in part 2 did not always exist in the pre-fabricated knowledge base, requiring the 5.1 Results: Roles and Templates participant to create a resource before adding the fact. 5.1.1 The Effect of Templates Our participants were split into four groups based on which We evaluated our template-related hypotheses using nine version of the program they used. The four versions were: specific items. For each of these items, the system contained a predicate, associated with an appropriate role, that users should ST (18 participants) – A “statements only” interface (i.e., no have noticed and could have selected using the template. attributes) that provided role-based template information when However, these items were in two groups. For six of them editing resources (as described above.) This is the version of the (items J, K, L, P, W, and X) the already existing predicate was interface which we hypothesized would be the most effective, in fact semantically appropriate for the item. For instance, for since it incorporated all three of the experimental features item J, “David Beckham scored 27 goals,” the pre-fabricated described above. knowledge base contained the predicate “goals” for the “Soccer S (19 participants) – A “statements only” interface with no Player” role. Therefore, it would have been appropriate and templates. This version was identical to ST, except that when in consistent for ST users to select “goals” from the template to edit mode, the greyed-out “suggested” predicates (such as record this item (as opposed to creating their own equivalent “dimensions,” “period,” and “influenced” in Figure 2) would not predicate such as “scored” or “numberOfGoals.”) appear. For the other three items, however, the already existing SAT (18 participants) – A “statements + attributes” interface predicate was not semantically appropriate for the item. We with templates. This version was identical to ST, except that the called these items “traps.” For example, the pre-existing “+Attribute” buttons were included so that users could choose predicate that ST users saw for item O, “The Matrix's gross between creating attributes or statements. (This is the version of earnings were $90 million,” was “net earnings.” We included the interface depicted in Figures 1 and 2.) these three items because we wanted to measure the degree of SA (16 participants) – Finally, in order to test hypothesis H2A, a danger templates may introduce in leading users to choose pre- number of participants received a more “traditional” version of existing predicates that are in fact not appropriate. OKM that permitted both statements and attributes, but provided We evaluated templates in two ways. First (hypothesis H1A) we no templates. compared the total correct responses between the ST and S We then evaluated our hypotheses by judging the contents of the groups for the nine items, treating each users' response for each Linked Data knowledge bases that users produced while item as a seperate trial. Considering all nine, 70.37% of ST carrying out the actions required in Part 2. We did this in the users' representations (114 out of 162) were correct, as opposed following way: to 67.25% (115 out of 171) of S users' representations. This difference was not statistically significant (p > 0.01 by Fisher's • H1A – compare groups S and ST for correctness. exact test). This indicates that templates do not improve • H1B – compare groups S and ST on the items for correctness of user-generated data, thereby refuting hypothesis which an appropriate predicate already existed in the pre- H1A. (Interestingly, when considering only the six non-trap fabricated knowledge base, to determine whether they used items, the ST group got 65.74% correct compared with S's that predicate. 61.40%; for the trap items, ST got 79.63% and S 78.95%, neither of which was statistically significant. It appears, then, • H1C – judge all groups on how often the roles they that templates have no impact on correctness, regardless of chose to put a triple under was conceptually correct. This whether the predicates in question are in fact appropriate.) was admittedly somewhat subjective, but in practice there was very little debate among the graders (the four authors We also studied the impact templates have on consistency of of this paper) as to whether a role was correct. data (H1B); i.e., the likelihood that data authors would re-use an appropriate predicate already existing in the system as opposed • H1D – for each item, evaluate the frequency with to creating a synonymous one. The effects of templates on the which participants chose the same role using Simpson's predicates used for the nine items are summarized in Table 1. diversity index[18]. Table 1. Effect of role templates on predicate usage. • H2A – for participants in groups SAT and SA, evaluate the degree of consensus participants exhibited in ST S choosing attributes or statements to represent the Predicate used: Predicate used: P-value information. Item Existing New Existing New J 7 11 0 19 0.0031 • H2B – compare groups SAT and ST for correctness. K 10 8 0 19 0.0001 • H3A – for the five Part 2 items requiring n-ary L 15 3 4 15 0.0002 relations, count the number of times participants (in all P 1 17 0 19 0.4865 groups) used predicate reification versus predicate W 5 13 1 18 0.0897 modifers to correctly encode them. X 3 15 0 19 0.1050 • H3B – for the nineteen Part 2 items that did not (Traps:) require n-ary relations, count the number of times C 18 0 19 0 1.0000 participants mistakenly used the predicate modifiers feature O 12 6 19 0 0.0080 and generated nonsensical Linked Data as a result. V 17 1 19 0 0.4865 The results show that users in the ST group were significantly (p < 0.01 by Fisher's exact test) more likely to (correctly) use the existing predicate in three of the six non-trap cases, and to role, and items for which the user would have to add a role to (incorrectly) use it in one of the three trap cases. This is a mixed the resource in order to make a reasonable choice. result. Evidently, templates effectively promote consistency for The results show that users were very likely (92.81%) to make a some facts but not for others, and they mislead users into reasonable choice if the relevant objects already had an semantic incorrectness for some facts but not for others. appropriate role, but less likely (73.32%) to add one themselves Hypothesis H1B appears to be confirmed only in certain cases. (p < 0.05 by a t-test). The diversity of roles was high for both When we probe the specific items to discover which kinds of kinds of facts, although the second group was more diverse (p < items templates assist, we discover that it greatly depends on 0.05). This indicates that users are often unable to choose word choice. Templates were not shown to be helpful for items correct roles, and are not reliably consistent with one another. P (“The song 'Stairway to Heaven' featured Jimmy Page on The fact that templates were successful at helping users enter guitar”, with predicate “plays” defined for role “Musician”), W semantically correct data despite these difficulties suggests that (“John Entwistle played bass on the song 'Behind Blue Eyes'”, users better guided by ontologies might experience greater with predicate “plays” defined for role "Musician"), and X benefits from a system that incorporates type and schema (“Paul McCartney wrote the song 'Maybe I'm Amazed'”, with information. predicate “composed” defined for role “Musician”). The Table 2. Correctness and consistency of role choices. wording of items P and X differs from the defined predicates, while the tense differs for item W. These results suggest that Object Already Had Reasonable Role users are less likely to use templates when it would require Item % Correct # of Roles D restructuring the sentence or using different terminology. D 98.51% 4 0.19 E 76.81% 3 0.36 In a real-world setting, of course, users are not translating H 95.59% 4 0.17 sentences into Linked Data, but “mental knowledge” into Linked K 94.37% 4 0.13 Data. We can only speculate as to the size of this effect for M 100.00% 1 0.00 mental knowledge, but it seems reasonable to assume that if a N 88.57% 16 0.65 user wants to encode a fact, and has a certain phrasing in mind, P 98.53% 3 0.31 they will succumb to the same pitfall that our testers did. S 75.71% 16 0.67 Note that two requirements must be met in order for a user to T 88.73% 6 0.72 take advantage of a template: they must (1) select the proper U 95.59% 4 0.53 role (i.e., the role that is the domain for the predicate), and they V 98.55% 3 0.18 must (2) observe and decide to use the relevant predicate for that W 97.06% 3 0.06 template. In cases where the user failed to use templates, neither X 98.57% 4 0.08 of these factors was entirely to blame. For items P and J, for Average 92.81% 5 0.31 example, 81.8% and 82.4% of the failures were due to choosing the wrong role; for items W and X, on ther other hand, 100.0% User Forced to Add Role and 86.7% were due to not using the right predicate within the Item % Correct # of Roles D (correct) role. It appears possible to fail at either. F 44.93% 4 0.53 Item O (“The Matrix's gross earnings were $90 million”) G 69.23% 6 0.48 illustrates the potential negative impact of templates. Six out of J 89.55% 6 0.58 eighteen ST users chose to use the existing predicate “net K 92.31% 6 0.42 earnings.” This result indicates a risk that users may select L 66.20% 9 0.53 incorrect predicates when they are lexically similar but O 75.36% 13 0.47 semantically different from the phrases they intend to use. Q 77.14% 9 0.43 However, this risk can be expected to diminish with substantial R 71.83% 4 0.53 domain knowledge, which equips users to correctly differentiate Average 73.32% 7 0.50 between similar terms. Overall, these findings suggest that templates assist with both 5.2 Results: Elimination of Attributes consistency and correctness in predicate usage, despite not being To determine whether the presence of attributes in the system effective in all cases. Templates are helpful when the user influences users' ability to represent data, each item was rated selects the appropriate role and when the existing predicate is for correctness. A response was considered correct if it consistent with the user's intended phrasing of the information. accurately conveyed the information given in the text, was consistent with the graph-based data model, and was associated 5.1.2 The Viability of Roles with a reasonable role. Table 3 compares the SAT and ST In order to evaluate the viability of roles, we examine both groups for overall correctness of data entry. correctness (H1C) and consistency (H1D) of roles chosen for 20 items. We judged correctness by the relevance of the chosen The results show no significant difference between the two roles to the information entered. For example, we considered groups (by Fisher's exact test, α = .01). (It is also the case that appropriate roles for item L – “Deion Sanders has stolen 35 no significant difference existed on any one item.) This finding bases” – to include “Baseball Player” and “Athlete” but not is consistent with our hypothesis that the presence of attributes “Football Player” or “Person.” Inconsistency is measured using in the system would not influence the quality of data produced Simpson's Index of Diversity (D = 1 - Σ(pi2), where pi is the by users. Thus hypothesis H2B is confirmed. proportion of users who chose role i) for each item. An index of We also hypothesized that, when forced to choose whether to 0 indicates that all users chose the same role. Larger values model a fact as a statement or as an attribute, users would indicate more distinct roles chosen and a more even distribution behave inconsistently with one another. Table 4 shows the data between roles. Both are shown in Table 2, divided into two we collected to evaluate this hypothesis. The 24 sentences groups: items for which the resource already had an appropriate contained 30 atomic facts, which are divided into five categories 5.3 Results: Predicate Modifiers based on whether their objects are: proper nouns (14 facts, such To evaluate the effectiveness of predicate modifiers as a tool for as “New York City”), common nouns (5 facts, such as “piano”), expressing n-ary relations, we exposed users to both predicate numeric values (9 facts, such as “186 pounds”), dates (3 facts, modifiers and the traditional method of predicate reification. such as “April 4, 2008”), and years (1 fact, “2004”). None of the 71 users employed predicate reification in Table 3. Impact of attributes on correctness. representations of any of the five facts containing n-ary relations. Table 5 shows the overall correctness of the user's Group Correct Incorrect % Correct representations of those facts (using predicate modifiers). SAT 336 96 77.78% Table 5. Use of predicate modifiers for n-ary relations. ST 353 79 81.71% p = .18 Correct responses Item (out of 71) I 60.56% Table 4. Consistency of the attributes vs statements choice N 66.20% among various types of data. P 50.70% Category Number of items % attributes S 56.34% Common nouns 5 27.10% W 49.30% Date 3 56.50% Average 56.62% Numeric 7 74.30% Proper nouns 14 16.20% Users were less likely to express these facts correctly than Year 1 20.70% simpler items. However, in light of our previous study[8] that showed that people with minimal training are extremely unlikely Our users demonstrated the highest consistency for proper nouns to properly express n-ary relations using triples, the results are promising. Hypothesis H3A is soundly confirmed. and the lowest for numeric values. However, it is clear that for none of the five types can consistency be counted on. No matter For the vast majority of test items, which contained only a what type of fact is being represented, different novice users single fact each, users did not attempt to (incorrectly) represent will encode it in different ways – some as attributes, some as them using predicate modifiers. (No more than 2 out of 71 users tried this for any of those items.) However, for item K (“Peyton statements – leading to basic inconsistencies in the resulting Manning passed for 206 yards, while Brett Favre threw for structure of the Linked Data. 315”) this did prove to be a common problem. This item Evaluating consistency in the abstract is difficult, but we note actually contains two separate binary relations, but 25 out of 71 certain items that show a surprising lack of consensus. For total users (35.2%) incorrectly applied predicate modifiers to instance: try and express it. This finding may suggest that novice users can have difficulty determining whether a complex thought Item F - “Kelly Witt is a freshman.” (14 attrs, 18 stmnts) represents a single n-ary relation, or a series of binary relations. Item S - “Michael Abram stabbed George Harrison on Dec. If so, we argue that this only emphasizes the need to investigate 30th, 1999” (the date: 19 attrs, 12 stmnts) more intuitive techniques for representing complex information. In any case, for simple sentences, hypothesis H3B is confirmed. Item Q - “Deion Sanders hit 7 home runs” (24 attrs, 9 stmnts) 5.4 Results: survey Users exhibit no strong consensus regarding how best to model Finally, our experiment ended with a 12-question survey in which participants answered reaction questions on a 6-point these pieces of information, and many others, confirming Likert scale. These measured user satisfaction with the system, hypothesis H2A. the ease with which they could locate information, etc. Only two For some items, a substantial number of users made very of the items demonstrated any significance between groups (to unintuitive choices. Item E, clearly a numeric value (“Ryan an α of 0.05): Medina's GPA is 2.79”) was represented as a statement 11 out of “It was easy to use the system to add new information.” 33 times (33.3%). Even more problematic is the tendency to The average responses on this item were: Group S=4.9, represent proper nouns as attributes. For item A (“Madison Group ST=5.5, Group SAT=4.1. Considering “templates” Square Garden is located in New York City”) 9 out of 30 users and “attributes” to be two independent variables, a chose to represent New York City as an attribute (30.0%) And univariate ANOVA test confirms that a “statements only” for item N (“Mark David Chapman assassinated John Lennon on interface has a beneficial effect on user perception of how December 8, 1980 at the Dakota Apartment Complex”), 17 out easy it is to add data. (p < 0.05). of 31 users represented Dakota Apartment Complex as an “I was confident that I added the information correctly.” attribute (54.8%). We argue that representing proper nouns like The average responses were: Group S=3.8 Group ST=4.6, “New York City,” about which many things on the Semantic Group SAT=3.9. The ANOVA test confirms that the Web are likely to be said, as anything other than resources is a template feature has a beneficial effect on user mistake, and that untrained users are likely to make that mistake confidence in adding data. (p < 0.05) often when given the choice. In the absence of a compelling (Note that since we had no survey information from an “SA” reason to do so, we recommend that systems not force users to group – i.e., attributes, but no templates – we could not detect make the choice between resources and literals. any possible interaction between variables.) Although we had no a priori hypotheses regarding user [7] Campanini, S., Castagna, P., and Tazzoli, R. “Platypus Wiki: preferences, this survey information seems significant. Users a Semantic Wiki Wiki Web,” Proceedings of the 1st Italian appear to have a preference for a “statements only” interface Semantic Web Workshop Semantic Web Applications and with templates. This type of interface modestly enchances the Perspectives (SWAP), (Ancona, Italy), 2004, pp. 1-6. user experience and raises confidence. [8] Davies, S., Zeitz, J., and Hatfield, J. “Addressing the Cognitive Difficulties of Expressing N-ary Relations in 6. CONCLUSIONS Semantic Web Data .” (Manuscript submitted for publication.) Novice end users, who are potential contributors to the Web of Linked Data, have substantial difficulties formulating [9] Kalyanpur, A., Parsia, B., Sirin, E., Grau, B., and Hendler, J. knowledge in the format the Semantic Web requires. User “Swoop: A web ontology editing browser,” Web Semantics: interface design, therefore, is paramount. Our empirical testing Science, Services and Agents on the World Wide Web, vol. 4, has shed light on certain aspects of how such interfaces are used pp. 144-153, 2006. in practice, and should be best designed. These include: [10] Kintsch, K. Comprehension: A Paradigm for Cognition, 1. Requiring users to group information about a resource Cambridge University Press, 1998. according its roles (types), and displaying a template of [11] Krötzsch, M., Vrandecic, D., and Völkel, M. “Semantic previously used predicates for each of those types, can help MediaWiki,” Proceedings of the 5th International Semantic channel users towards the re-use of predicates that already Web Conference (ISWC06), Springer, pp. 935–942, 2006. exist, avoiding undesirable proliferation of synonyms. We also believe that roles and templates provide benefits in [12] Luczak-Rosch, M. and Heese, R. “Linked Data Authoring terms of more facile navigation and the implicit creation of for Non-Experts.”Proceedings of the 2009 Linked Data on the domain, range, and type information. However, users are Web Workshop (LDOW09), 2009. not consistent in their choice of roles, suggesting that some [13] McBride, B. “Jena: A semantic web toolkit,” IEEE Internet mechanism for encouraging consistency would be wise. Computing, vol. 6, pp. 55-59, 2002. 2. Users are very inconsistent in choosing to model an object [14] Mitchell, T., Betteridge, J., Carlson, A., Hruschka, E., and as either a resource or a literal, and this appears to be true Wang, R. “Populating the Semantic Web by Macro-reading across a wide variety of types of objects. In many cases Internet Text,” Proceedings of the 8th International Semantic they simply make inappropriate choices. Moreover, users Web Conference (ISWC09), Chantilly, VA: Springer-Verlag, pp. appear to be just as successful generating Linked Data 998-1002, 2009. when they use an interface that only supports resources. We [15] Noy, N., Grosso, W., and Musen, M. “Knowledge- therefore recommend that tools for authoring Linked Data acquisition interfaces for domain experts: An empirical not include literals in the interface. evaluation of Protege-2000,” Proceedings of the 12th Internal 3. A scheme for allowing users to express n-ary relations with Conference on Software and Knowledge Engineering, Chicago, modified predicates, rather than with traditional predicate USA, July, 5-7, 2000. reification, can enormously increase their success in [16] Pietriga, E. “Isaviz: a visual environment for browsing and modeling such information. authoring rdf models,” Proceedings of the Eleventh 7. ACKNOWLEDGEMENTS International World Wide Web Conference, 2002. We would like to thank Trillane Burlar and Christopher (Shane) [17] Schaffert, S., Eder, J., Grünwald, S., Kurz, T., Radulescu, Voisard for their inimitable development work, without which M., Sint, R., and Stroka, S. “KiWi–A Platform for Semantic this project would not have been possible. Social Software.” Proceedings of the Workshop on Semantic Wikis, in conjunction with The 6th European Semantic Web 8. REFERENCES Conference (ESWC09), 2009. [1] Anderson, J. Cognitive Psychology and its Implications [18] Simpson, E. “Measurement of diversity,” Nature, vol. 163, (Seventh Edition), Worth Publishers, 2009. 1949, p. 688. [2] Auer, S., Dietzold, S., and Riechert, T. “OntoWiki-A Tool [19] Sowa, J. Conceptual structures: information processing in for Social, Semantic Collaboration,” Lecture Notes in Computer mind and machine, Reading, MA: 1984. Science, vol. 4273, pp. 736-749, 2006. [20] Staab, S., Maedche, A. and Handschuh, S. “Creating [3] Berners-Lee, T. “Linked Data – Design Issues,” 2006. metadata for the semantic web: An annotation framework and Available at: http://www.w3.org/DesignIssues/LinkedData.html. the human factor,” Technical Report 412, Institute AIFB, [4] Berners-Lee, T., Hollenbach, J., Lu, K., and Presbrey, J. University of Karlsruhe, 2001, 2001. “Tabulator Redux: Writing Into the Semantic Web,” [21] Suchanek, F., Kasneci, G., and Weikum, G., “Yago: a core Proceedings of the 2008 Linked Data on the Web Workshop of semantic knowledge,” Proceedings of the 16th international (LDOW08), 2008. conference on the World Wide Web, p. 706-715, 2007. [5] Bollacker, K., Evans, C., Paritosh, P., Sturge, T., and [22] Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., and Taylor, J. “Freebase: a collaboratively created graph database Studer, R. “Semantic Wikipedia,” Proceedings of the 15th for structuring human knowledge,” Proceedings of the 2008 international conference on the World Wide Web, Edinburgh, ACM SIGMOD International Conference on Management of Scotland: ACM, pp. 585-594, 2006. Data, Vancouver, Canada: ACM, 2008, pp. 1247-1250. [23] W.A. Woods, “What's in a link: Foundations for semantic [6] Bollacker, K. Tufts, P., Pierce, T., and Cook, R. “A Platform networks,” in D. G. Bobrow and A. Collins, eds, Representation for Scalable, Collaborative, Structured Information Integration,” and Understanding. Academic Press, New York, 1975. Proceedings of the 6th International Workshop on Information Integration on the Web, Vancouver, Canada, 2007.