Formalization, User Strategy and Interaction Design: Users’ Behaviour with Discourse Tagging Semantics 1 2 2 Bertrand Sereno, Simon Buckingham Shum and Enrico Motta Centre for Advanced Learning Technologies, INSEAD Knowledge Media Institute, The Open University Boulevard de Constance, F-77305 Fontainebleau Walton Hall, Milton Keynes France UK bertrand.sereno@insead.edu sbs@acm.org / e.motta@open.ac.uk ABSTRACT When authors publish their interpretations of the ideas, 1 INTRODUCTION opinions, claims or rebuttals in the literature, they are Our communities, local, national and international, are drawing on a repertoire of well understood moves, confronted by problems that are complex due to the contributing to an extended discourse. Readers also bring changing environment, incomplete or ambiguous their own perspective to documents, interpreting them in the information, and stakeholders with different perspectives. light of their own research interests, and initiating, for Such domains include strategic planning in business, instance, new connections that may not have been intended government policy formulation, time-pressured mission by authors. Collaborative, social, tagging holds promise as operations, and almost all scholarly research. The an approach to mediating these processes via the Web, but sensemaking activity that these contexts demand [25], may lack the discourse dimension that is fundamental to the requires analysts to construct plausible narratives that frame articulation of interpretations. We therefore take a hybrid the problem, account for the available evidence, and semiformal approach to add structure to freeform motivate action, enabling “an openly reflexive forum in folksonomies. which communities of knowing explicitly talk about their Our experience demonstrates that this particular brand of understandings” [2]. Progress is made by making moves that tagging requires tools designed specifically for this express and contest interpretations of the world, although sensemaking task by providing enough support to initiate the these different contexts clearly have very different genres of annotation, while not overwhelming users with suggestions. discourse and criteria for acceptance. The focus of this paper We describe a tool called ClaimSpotter that aims at is on the work of academic researchers, but we argue that supporting this tradeoff, through a novel combination of related work shows that an approach grounded in discourse system-initiated tag recommendations, Web interface relations is applicable to a broader range of applications. design, and an expanded conception of how tags can be both It is established from corpus analyses that when expressed, and semantically linked. We then report a researchers publish their interpretations of the ideas, detailed study which analysed the tool’s usability and the tag opinions, claims or rebuttals in the literature, they are structures created, contributing to our understanding of the drawing on a repertoire of well understood moves, implications of adding structure to collaborative tagging. contributing to an extended discourse [21]. Although the Internet is accelerating the pace of exchanges, scholarly and scientific discourse still proceeds in the shadow of the Categories and Subject Descriptors printing press, with exchanges now disseminated as digital H.1.2 [Information Systems]: User/Machine Systems— prose. While information retrieval and text analysis Human information processing; H.4.m [Information technologies help to infer certain kinds of structure within Systems]: Miscellaneous and between papers, our research is complementary, exploring a ‘network-native’ paradigm in which the key Keywords claims made by an author (and the interpretations made by their readers) are published as explicit new connections to Social tagging, Sensemaking, Discourse Relations, the literature. The research question driving our research is: Semantics, Argumentation, Usability, Pragmatic Web can we model the discourse structures we find in research communities as explicit structures, and if so, what support Work conducted while the first author was at The Open University. Copyright is held by the author/owner(s). WWW2007, May 8-12, 2007, Banff, Canada. tools can we provide to construct, navigate, and interrogate 1. As with folksonomies, tags remain unconstrained such structures? Such approaches to knowledge publishing freetext strings, although users can choose to take care and negotiation on the Web will ideally be both quickly to reuse existing tags in order to increase the visibility learnable, yet sufficiently expressive to permit researchers to of their tagging, or to discover new connections. In our make important scholarly moves, and assist them in making context, however, tags may become phrases or even a sense of the emergent structures at scale. sentence or two if they are used to express, for instance, What is especially interesting about scholarly discourse a hypothesis, a prediction or a research result. is the fact that “the truth” or “the significance” of any claim 2. A critical difference is that tags may be linked not just is open to contest. While this may be extreme in the case of to a URI, but to each other. We term a tag— philosophy and the humanities, it is self-evidently also the relationship—tag triple a claim, that is, a meaningful case even in computing and the hard sciences. There is no connection being asserted between two ideas. A claim single reading of a paper; interpretations may differ may also link from/to other claims, as the ideas grow in significantly between readers and authors (hence the need complexity. A claim is also directed: it has a source and for peer review); readers bring their own unique perspective a destination tag. to a paper, seeing new connections that the author may never 3. Tags are linked using a typology derived from have intended. Seeing old things in new ways is the essence argumentation and the most common moves made in of creativity. This is the orientation we bring to harnessing research publications. Users select the relationship from the power of social tagging, but augmented with discourse a menu of predefined relationships (e.g. is consistent semantics, as we strive to create effective infrastructure for with, refutes, addresses, solves, improves on, is scholars to express—and contest—claims to knowledge. analogous to, uses/applies). 4. Tags may optionally be classified (e.g. problem, 2 SCHOLARLY TAGGING evidence, data, method, theory), but these are pragmatic, contextual roles, holding only in the context of a particular claim. Thus, in one context, a research 2.1 Questions no search engine can answer result might be a problem, while, in another context, it Consider the following questions that interest students and might be an assumption. researchers, but which neither Internet search engines nor domain-specific digital libraries can assist in answering: 2.3 Relation to our previous work What data refutes this hypothesis? Are there different Elsewhere [4, 24], we have demonstrated how a digital schools of thought in this field? Is there an analogy between library can be tagged in this way with annotation tools, the this process in fields X and Y? Why does this paper cite that resulting network navigated via interactive visualizations, one? How did these contrasting perspectives interpret this and the semantic searches enabled by modelling discourse result? The answers to these questions are grounded in the relations (e.g. show papers that support, or contrast in some discourse moves that researchers make in their writing: the way with this paper; show the lineage or ancestors of the arguments, rhetoric and positioning of their claims with idea represented by this tag). We have also evaluated how respect to the literature. In our present infrastructures, these students make use of some of the tools to navigate and are questions that can only be answered by reading the search prepopulated networks modelling a literature [4].2 paper, although there is active research on the automated Having demonstrated the potential of scholarly analysis of argumentative relationships between papers [22]. publishing and annotation using discourse relations to These are fundamentally issues of interpretation, which annotate texts, the challenge (as for any structured fall outside ontology-based Semantic Web approaches which knowledge capture tool) was can users do this? To date, we model stable, consensus, ‘objective’ worlds (albeit always have not presented data on tag authoring behaviour. This from a perspective). Nor can they be answered by paper reports the first quantitative and qualitative analysis of scientometrics (e.g. citation analysis) which do not have the ways that novices and experts approached semantic enough insight into the nature of the moves being made. We tagging in their first encounter with a software tool. are now squarely in the realm of pragmatics, where meaning Semantic tagging behaviour is inextricably linked to (1) the derives from interpretation, perspective, contextualisation semantic scheme, as introduced above, and (2) the user and argumentation—in other words, the construction of interface and functionality of the tagging tool, introduced plausible narrative, as introduced at the start. next. 2.2 Discourse semantics for annotating claims 2.4 ClaimSpotter We take a hybrid, semiformal approach to add structure to The previous annotation tools developed to support our freeform folksonomies. Details can be found in [4, 13, 24] 1 tagging approach did not allow direct annotation of the target document. This was an explicit goal with 1 The work of ISO/TC37/SC4 shares a common interest in discourse and coherence relations: http://www.tc37sc4.org 2 Demonstrations and screencasts: http://claimaker.open.ac.uk Figure 1: The ClaimSpotter interface. Key: [1] The My/All Tags toolbar button highlights text matching tags on this paper (just the user’s, or all tags), e.g. [2] measures of trust in the content of Web resources. Clicking a highlighted tag enters it in the tag linking form. Similarly, the Relations filter [3] optionally applied to the whole document [4], highlights verbs matching, or synonymous with, the link types, e.g. [5] the verb describes in the text has been matched to the tag relation is about. Tag triples can be built from existing tags using the [X,.,.] and [.,.,X] buttons [6] to specify the left and right sides of the triple. The tag link is selected from the menu [7]. Notes can be saved as tags [8], which like the document text, can on request be parsed for matching tags and relational types. Further examples and screencasts: http://kmi.open.ac.uk/projects/hyperdiscourse/tools/claimspotter ClaimSpotter, designed to support document sensemaking recommendation agents), or characterise the impact of a tool tasks: reading, highlighting areas of potential interest, on the practices of researchers (which one might do with making notes, looking for specific kinds of papers in the mature tools like Google, Wikipedia or del.icio.us). We bibliography, and so forth. While researchers clearly do this conducted a formative evaluation of a new prototype, in all the time on paper, or with freetext annotations in various order to develop a language in which to describe as yet document viewers, the challenge was to support users in poorly understood phenomena. Our specific objective was to these tasks with our semantic tagging approach. characterise how annotators approached the task we gave ClaimSpotter’s design aims to initiate and sustain a them with an unfamiliar tool, paying particular attention to dialogue between annotators and the target document, via how the affordances of the user interface (that is, the visual (i) content-based support for tagging, in the form of cues it provided for interaction) shaped tagging behaviour, recommendations, and (ii) an interface displaying these summarised quantitatively against various measures, and recommendations overlaid on the text (cf. figure 1). Details explained through qualitative coding of the data. are in [18, 19]; we turn now to the evaluation study. 3.1 Methodology 3 USER EVALUATION STUDY We recruited 13 annotators (referred to as a1–a13) who used ClaimSpotter to annotate a 2 page research paper which they There are many types of evaluation. A summative analysis had preferably authored, or were at least very familiar with could evaluate technical performance (e.g. of to avoid any comprehension problems. Ten users were PhD students, two were research fellows and the last was a The examples below demonstrate the variety of tag triples professor. None had used ClaimSpotter before. Four of them created by participants: (a1–a4: 1 student, 2 RFs and the professor) were considered ‘experts’ with the tag linking scheme, being members of the [Domain ontology, is about, A hierarchy of URIs on project team. The remaining nine (a5–a13) were considered multiple levels] ‘beginners.’ Each session was limited to one hour. Screen interactions [Universal physical access, is unlikely to affect, Digital were recorded with a capture tool, and all comments and divide] discussions recorded, resulting in high quality audio-visual data as digital movies. A tutor (first author) bootstrapped [Hypertext node juxtaposition. is analogous to, each annotation process by defining a few tags for each Cinematic shot juxtaposition] document. He was also present throughout the session to provide assistance when needed, but also to engage [(Evidence) In the Bristol trial, the awareness of the discussion when suggestions were made. A questionnaire presence of other players was correlated with how much sent one week after the experiment was designed to elicit our participants enjoyed the game as well as with how opinions on the main strengths and weaknesses of the engaged they felt, is consistent with , Presence interface, and on the ways it could be improved. See [18] for awareness of many other people is capable of causing, more details. feel good factor] [Magpie moves away from hypermedia towards open 4 QUANTITATIVE ANALYSIS service-based architectures, is evidence for, [Magpie, 257 tags and 160 claims were submitted, giving on average improves on, COHSE] ] 19.8 tags and 12.3 claims per annotator, with no major difference between the 4 experts and 9 beginners, the former It can be seen that tags ranged from single words to a entering marginally more tags (a mean of 20.75 against sentence, are optionally given a type (cf. fourth example.) In 19.3) and links (a mean of 14.75 against 11.2) than the latter. the last example, a tag is linked to another triple to create a compound claim. 4.1 Tags 4.3 The is about link Most tags submitted were 1–3 words. 164 out of 257 tags (64%) submitted were ≤3 words. Short tags (representing If we consider conventional tagging on the Web, the proper nouns, acronyms or projects names) were as assignment of a tag to a URI is semantically very close to frequently submitted by novices as by experts. Most of these simply asserting that the content is about that tag. We tags were used twice, while a handful were used three times. performed a detailed evaluation of the use of the is about Duplicated tags were either created ‘explicitly’ by reusing a link, since it was one of the most commonly used links. It is tag previously created in the current document, or what we might term a ‘less committing’ link compared to ‘implicitly’ by typing a text string which happened to be stronger, more argumentative relations such as challenges, already used as a tag. However, the documents chosen by proves, or is analogous to. This of course does not mean that participants were so different that duplicates were mostly is about links have little value: they have as much value as due to annotators reusing a tag created beforehand by the current tagging practices, and when used between two tags, tutor. We also noticed that reused tags were not necessarily such a connection can express a valuable and surprising composed of short tags only: some longer tags were reused. stance if they were previously thought unrelated. Experts submitted proportionally fewer is about links 4.2 Tag triples (“claims”) than beginners, which we attribute to their greater awareness of the other links available. Beginners, by contrast, were 22 relation types (out of the 36 available) were used. 7 out of more likely to use is about as a placeholder ‘catch all’ link, these 22 were used only once or twice. ‘General’ relations especially when they had not yet established if the link they were the most frequently used ones, but it is difficult to talk had in mind was on the menu (see the user strategy ‘Starting about these most frequently used relations, as the papers from the tags’ discussed shortly.) considered were different. A more interesting aspect may be Those annotators who made more links made to identify which relations were the most consistently used proportionately more is about links. In contrast, annotators by annotators. who made fewer links made almost no use at all of them. It The relations uses/applies/is enabled by and is about appears that they focused directly on forging stronger links. were the two most consistently used: only 3 annotators did not use the former at all, and only 4 did not use the latter. If we divide each annotator’s total link set in half, we Formalization find more is about links in the first half than in the second creating a tag half. We interpret this as confirming the idea that this lower choosing a tag type commitment link helped to scaffold users into this new appropriate tag type mode of tagging. 8 annotators out of 13 had submitted at not perfect tag type but problem with least one is about link. As they became more knowledgeable or lack of a tag type about the process and the links available, there seemed to be cannot find a tag type less need to fall back on is about. It can therefore be seen as removes tag type a mechanism to incrementally formalize [20] one’s tagging. deletes tag We can imagine ClaimSpotter prompting annotators at a creating a claim later stage to review whether to ‘upgrade’ is about links to choosing a relation more specific ones. removing a claim ... 5 QUALITATIVE ANALYSIS discussion about formalism User Strategy The qualitative analysis focused on the audio-video data. We keeping things simple used a shallow Grounded Theory methodology to code the reducing amount of information on screen video transcripts (to create concepts) and organise them (in looking for ideas order to draw relations between these concepts) [7]. The focussing on a particular area outcome of this methodology was (in Grounded Theory hiding an area terms) a ‘theory’, that is, a set of plausible relationships ... holding among multiple concepts. Concepts emerged from starting a claim from the tags the analysis and were constantly compared against each starting a claim from the relation other through specialization of codes into sub-codes, or vice- typing or selecting a tag versa, consolidating sub-codes into parents (called incremental formalization categories). Finally, a stable state (the point of theoretical reusing a tag or a claim previously submitted saturation) was reached where the codes were judged to ... account for the salient phenomena. The final taxonomy is Interaction Design given in Table 1, providing a more nuanced vocabulary than consistency available prior to the study, in which to describe users’ feedback tagging behaviour with ClaimSpotter. Discussion is ... organised around the three top level themes: Formalization, Miscellaneous User Strategy and Interaction Design. Table 1: Extract of the data coding scheme which 5.1 Theme 1: Formalization emerged from the analysis of tagging behaviour: themes, categories, sub-categories and codes The analysis of behaviours grouped under formalization yield insights into the degree of cognitive effort it took users will willingly add tag complexity as it serves their to use the new structured tagging scheme. anticipated needs. Assigning types to tags Relation types Most users decided not to add a tag type simply because it An appropriate relation was found in 115 occurrences, out of was optional: types were assigned 34 times, out of a total of 160 total. However, just as we found with choosing a tag 257 tags. Twice, types were explicitly not assigned because type, we observed difficulties in choosing a relation type: there were too many (“The interesting thing is that this • On 8 occasions, a ‘good enough’ relation was found. specific example (tag) could fall in different categories.”) This means that the annotator kept and submitted the and once, because there were not enough (“It’s not a triple, although it did not express completely what she problem, it’s not a solution, and it’s not a methodology. I’d had in mind (“I can say is similar to, since there is like something that says research field”). Search was not part nothing else better than that”); of this evaluation task (the focus of a previous experiment • On 6 occasions, the problem was even more acute: “The [23]. We have not yet gathered longitudinal data with relation (that I want) is not there. So what do we do?” It extensive tag authoring and searching, but we hypothesise resulted in the removal of the whole triple that was that as users learn that they can search on types (e.g. find all being created. instances where this tag was considered an assumption), Multiple attempts were sometimes needed to get a claim they might start to assign them in anticipation. This is right. This implied either trying different relations and analogous to expert users formulating compound finding out which one looked (and, actually, sounded, as specializations of tags in Web social bookmarking. Users annotators were saying them aloud very often) best, flipping the source and destination tags, or reformulating a tag to They could also be activated to discover (and reuse) existing make it suit a given relation. We recorded 11 incidents when tags, to position an argument with respect to peers’ tags, to an annotator had to reformulate the wording of a tag because find out how a particular tag was used over the corpus, to of a relation. find peers’ tags and claims, to indicate which tags were associated to a cited document, or to indicate how a cited ‘Good’ and ‘bad’ tags document was assessed by its author. One annotator commented that a tag she was considering Incremental formalization adding was “a silly tag” but that she would “make it anyway”, because it was of interest to her. She then added: Tags and claims were not necessarily submitted “I’m not sure if that tag’s going to be good. Maybe some of immediately. Instead, they were often kept on the screen these tags are less useful than the others.” Prompted to because annotators felt the need to see them to facilitate the comment on her notion of tag utility, her answer was most creation of claims. Saying aloud the relations was also a interesting: “A good tag will be something that is consistent, phenomenon we often noticed, as mentioned earlier. something that would appear again and again in the Another strategy-related phenomenon was related to the document. [Tag name] is a good tag for instance, compared order in which annotators accessed the different resources at to something I would use only once.” This notion of quality their disposal. They seemed to focus first on making their derived from potential reusability, which is clearly the own annotations (possibly to get their feet wet with the conventional understanding that users bring to tagging. This formalism) before browsing through the history and looking puts a premium on short tags referencing real world entities, for relevant tags and claims from their peers. This may have such as the names of theories, algorithms, problems or been an experimental artefact (the need to ‘get something methods. These are, of course, the sorts of entities that are done’ by the hour)—“For the time given, the easiest thing is extractable automatically, compared to the more complex to see the system suggestions and make your own. Because tags that ClaimSpotter supports, but which were more novel go back and look through the history may just take too much to users and were used less frequently. In devising an time”. But it may also have to do with a desire to appropriate interface for more subjective interpretative tags, this the document first, to make it their own, before turning to comment gave us pause for reflection on how the interface what their peers said about it. could have encouraged richer tags, to move users beyond the stereotype. See also the later discussion on the bias we Starting from a relation vs. starting from tags unwittingly gave in the user interface to short, matched tags, We also observed a striking difference between how which reinforced this emphasis. (mostly) experts started from the relation type they wanted 5.2 Theme 2: User Strategy to use for a claim and how (mostly) beginners started from the two tags they wanted to put in relation, without knowing Users are hard to predict, each brings his/her own unique if the relation type they wanted to use existed. On reflection, knowledge of their domain, and varying expectations about this phenomenon is not surprising, but this was the first the formalism and tool. Although we might have expected as empirical evidence we had. many strategies as we had annotators, we believe we have identified several patterns. Towards a new kind of annotation process? Roles played by recommendations Although a few of the claim-spotting filters did exhibit some unwanted results on the papers provided by the annotators, We noted a difference in the amount of support annotators the visual noise levels were not as damaging as had been wanted from the interface, and its ability to extract and feared. What was of interest to us was whether the very ‘recommend’ elements through text highlighting. Beginner presence of recommendations and peers’ tags shaped annotator a7 made little use of the recommendations and annotators’ behaviour. spent most of the experiment inputting her own tags and We characterise the effect that highlighted tags had as claims, while all the other participants did actually use the follows. From a situation in which annotators are given no suggestions. cues as to how to tag a document, we moved to a situation in Expert annotator a1 preferred at one point to deactivate which they had to decide if an existing tag was good the suggestions because, in her words, “I don’t want to be material to make a tag or a claim or not. We felt that too distracted by having too many things going on. At the annotation moved towards making a Yes/No decision in moment, it seems to be quite complicated. I’d rather keep it response to each recommendation. We will revisit this point simple.” Later, however, she made use of the later. recommendations “to see if there's anything inspirational (in this part of the document)”. 5.3 Theme 3: Interaction Design Recommendations were typically used to reduce the We studied in detail the annotators’ interactions with the document to a set of potentially interesting focal fragments. interface, and concluded that the environment was reasonably intuitive within the constraints of the recorded task of annotating a single document. Longitudinal Annotating and checking for visual feedback evaluation with large tag sets will undoubtedly reveal other This move illustrates the dominance of ‘visible’ tags, which design weaknesses. as discussed, had not been foreseen. Users would select, Successful features copy and paste some text from the document into a tag, submit it, and immediately activate the ‘my tags’ filter to see The presence of pull-down menus of tag types and relations it appear highlighted in the text, confirming that it had been on the screen succeeded as a visual scaffold: “I’m looking recorded. through the types because I’m not familiar with them.” The presence of the multiple tag types available also drew 6 DESIGN WEAKNESSES annotators’ attention to specific aspects of the paper that they might choose to focus on, e.g. what is the problem In this section, we reflect on some of ClaimSpotter’s design tackled, or the methodology proposed? weaknesses, and consider improvements that may also be of The tag-linking features were also very successful, relevance to other collaborative knowledge structuring tools. encouraging a playful approach: the act of combining and swapping tags between the left and right sides of the link 6.1 Information overload? was made easier by not having to retype them, and ClaimSpotter’s filters were designed to address the introduced a bricolage aspect that encouraged challenge of supporting an annotator in the task of locating experimentation. and tagging a document’s contributions. The presence of As users gain confidence with a tool, they develop highlighted tags and text fragments undoubtedly shaped the interaction routines, that is, compilations of micro-actions. annotation process, and we have evidence that annotators These routines provide us with another way to describe the valued seeing these, with some variations in when they coupling between user interface and structured tagging. activated them. We did find evidence, however, that there may have been too much information. As mentioned by a4, Navigating and tagging by document section “the problem is, do you make your own claims, do you A simple routine was navigating via the contents menu to a follow the system, do you go back to the history to see what particular section, reading/skimming it and summarising it the other people have said?” via a tag. This enabled a user to work through the text There is no question that for a one-hour experiment, systematically, and confirmed the value of integrating the there was indeed a lot of information to understand and document and the annotation in a seamless interface. digest. More studies are needed to introduce the different sources of support more gradually, and to let annotators Navigating and tagging by recommendation decide which ones work best for them. Better ways to organise these recommendations need to be devised (work A variation on this was to work from the output of filters: has begun on a dialogue assistant that helps annotators ask switching on a filter, looking at a highlighted area in the themselves focused questions about the document, and document, reflecting on it, modelling a tag or a claim, and which suggests recommendations for each question). moving to the next highlighted area. This sequence again confirmed the ability to move fluidly between engaging with 6.2 ‘Current-document centeredness’ the document, and tagging, with highlighted tags in the text acting as attention-catchers. New users will focus on what they are offered by the display. ClaimSpotter’s document-centric design Combining tags into claims emphasised the current document, at the expense of easy access to cited documents, for instance. Our conclusion is The process of claim authoring evolved into a recognisable that this resulted in a limited number of claims being made pattern of creating a tag, creating another tag, combining which connected tags originating in different documents. them in a claim, looking for a discourse link, not finding However, our other work has evaluated user interfaces that one, flipping the order of the tags in the relation, and finally foreground the tag space structure, providing a finding an appropriate relation. complementary perspective [23]. Reusing and adapting peers’ tags 6.3 User ‘laziness’ Some users learnt to use the less obviously available history Our objective was to devise a more active interface to window (listing, among others, non-matched tags.) suggest possible tags. We now play devil’s advocate and ask Consulting the tags available and reusing one or more in if tags and claims would not be more reflective if they had to one's own tag space demonstrated that annotators did benefit be devised manually by the annotator? By saving the from peers’ tags. annotator the cognitive effort of formulating their own tags, are we undermining the very process we want to promote? We observed a tendency to create (i.e. reuse) tags from tools that forge a link between argumentation and current text fragments highlighted in the document by the Web annotation tools and practices [11]. recommendation filters. Some of these were copied and Our work builds on research into readers’ annotation edited to taste, but they were nevertheless heavily inspired practices, in which annotation is a means to record personal by the highlighted elements in the original document. While ideas and interpretations, including connections to additional this seems to be a ‘good’ thing both in terms of usability (it scholarly documents, reformulations of the authors lowers the barrier for constructing semantic literature arguments, assessment of its significance or ‘warning’ models), and in terms of the building of a network signals to indicate key passages [14]. However, we are promoting the reuse of tags, there is the corresponding risk exploring the representational and interactional requirements that less effort is put into the annotation: the user comes to for tools to enable these personal perspectives to be made expect the system to bring her the salient facts about a public as a semiformal network that can be managed, document (whether these are composed of important extended, and contested. Current annotation tools [16] sentences, or matched existing tags.) provide no support to manage what might be thought of as While this may represent a new paradigm for scanning large scale annotations on annotations. and tagging documents, we are also cautious about the Ontology-based annotation tools are being developed as implications. Lazy annotators may be tempted to accept an essential part of the Semantic Web movement. However, them without critically assessing them, resulting in the these applications may in fact be better characterised as the propagation of poor tags. Within an educational context, one supporting the ‘translation’ of information in the document possibility would be to keep tag suggestions and automatic into ontological entities. Although there may be debate text highlighting at an imperfect level, to maintain students’ about how to map an entity into an ontology, the material vigilance. itself is not normally the focus of contention (such as the names of people, events, locations, processes). The tools 6.4 Interface bias towards ‘matched’ tags certainly do not aim to support debate about the significance Let us now consider the ‘matched tags’ recommender. or meaning of an entity in a document. Matched tags (exact matches, as with most social tagging Our use of recommendation filters derives from work on tools) were privileged in the user interface over non-matched the summarisation of scientific papers. Potentially relevant ones: the former were visible via the activation of a filter passages can be delimited with multiple approaches, based and highlighted in bright yellow zones directly in context on (i) the structure of the (scholarly) document [1] within the document, while the latter were ‘hidden’ in the (ii) surface-based features [11] (iii) topical coherence [17] separate history window. Matched tag highlighting gives and (iv) rhetorical coherence measures [22]. Other work on immediate feedback to annotators, and the satisfaction of literature-wide analysis on which we could draw includes seeing one’s tags highlighted on the text is akin to that identification of relevant documents by analysing their gained in social bookmarking when one’s tagged pages citations sections [9, 12]. Pivotal points can also be proposed show up with the rest of the world’s. to filter a network of documents and retain only the most However, we again raise the question of the quality of important ones [5]. Nanba et al. [15] also propose an tagging, whereby the emphasis could shift from reflectively approach to both identify reference areas and the role [26] submitting new tags, to submitting ‘visible’ tags (that is, played by these areas. They consider the following roles: matched by the dedicated recommender). Better presentation references indicating other researchers’ theories or methods options must be devised, including a mechanism to display used as a basis, references to related works to mention a ‘non-matched’ tags in the main window. Although this has contrast or a problem and other references. not been verified, it may be that the user interface design led Since researchers clearly need to annotate domain annotators to forget that there might be other tags: it terminology, Semantic Web annotation tools are part of the certainly did not actively remind them. This may have led solution. In CREAM [8], an annotation by mark-up mode is them to submit more ‘copied-and-pasted’ tags. This added provided, enabling the user to select any piece of relevant focus on the visual salience of highlighted text spans may information from the page and drag and drop it to create or also mean that matched tags became a way to cover the instantiate the selected concept instance (researcher name, document with tags. By doing this, annotators received address…) Text fragments are extracted from the page to implicit feedback that they had read the document. foster a semi-automatic annotation: the knowledge expert agent only has to validate the extracted elements. However, following the social tagging paradigm, 7 RELATED WORK annotators in our approach will tag only those elements in a Our work is one strand in research on computational text that reflect their interests (there is no gold standard set modelling of argumentation (e.g. COMMA [6]), but while of tags that can be automatically extracted, since there is no other work focuses on the formalization of human or agent single, authoritative meaning). As we have argued on argument structures and processes, we place more emphasis theoretical grounds elsewhere, the representational on interaction design, and on the development of software requirements for modelling discourse are different [13]. This work is therefore better framed not so much as Semantic Technologies (AKT) project, an Interdisciplinary Research Web (controlling interpretation through consensus domain Collaboration (IRC) sponsored by the UK Engineering and models) than as Pragmatic Web (foregrounding context, Physical Sciences Research Council (GR/N15764/01). The argument, interpretation and perspective) [3]. AKT IRC comprised the Universities of Aberdeen, Edinburgh, Sheffield, Southampton and The Open 8 CONCLUSIONS AND FUTURE WORK University. We offer this analysis as an example of a human-centred 10 REFERENCES design process for collaborative knowledge structuring environments. We hope that the particular approach we are [1] P. Bishop. Digital Libraries and Knowledge developing contributes to wider efforts to add greater Disaggregation: the Use of Journal Article representational expressiveness to social tagging, without in Components. In Proceedings of the 3rd International the process straitjacketing it. Conference on Digital Libraries. ACM, 1998. Social bookmarking via freeform ‘folksonomic’ tagging [2] R. J. Boland and R. V. Tenkasi. Perspective Making is demonstrating its huge potential for collective indexing of and Perspective Taking in Communities of Knowing. materials through emergent vocabularies. In our approach, Organization Science, 6: 350–372, July 1995. we have preserved the freedom that folksonomic tagging [3] S. Buckingham Shum. Sensemaking on the Pragmatic permits in what counts as a ‘tag’, added the option to Web: a Hypermedia Discourse Perspective. In classify tags, and introduced the option to link tags using Proceedings of the 1st International Conference on familiar ‘research moves’, but predefined in order to the Pragmatic Web. GI Lecture Notes in Informatics, leverage automated filtering and search. The ClaimSpotter September 2006. prototype supports the collaborative annotation of [4] S. Buckingham Shum, V. Uren, G. Li, B. Sereno, and documents using this representational scheme. We have C. Mancini. Computational Modelling of Naturalistic summarised a detailed analysis of how annotators made use Argumentation in Research Literatures: of the tool in their first hour of usage, describing the results Representation and Interaction Design Issues. under the themes of Formalization, User Strategy and International Journal of Intelligent Systems, 22(1): Interaction Design. 17–47, 2006. This work is being developed in several directions. There is clearly scope to improve the interface design, and to add [5] C. Chen. The Centrality of Pivotal Points in the the kinds of flexibility that we see in social tagging Evolution of Scientific Networks. In Proc. Int. Conf. interfaces such as recording tags as private, personalising Intelligent User Interfaces, pages 98–105, ACM. recommendation filters, and enabling richer user profiles. [6] COMMA: 1st Int. Conf. on Computational Modelling ClaimSpotter is one of a suite of tools being developed in of Argumentation, (Sept.’06), Liverpool, UK, IOS the Hypermedia Discourse project3 in which we are now Press developing a server to provide coherence relations-based [7] B. G. Glaser and A. Strauss. Discovery of Grounded tagging services, which we conceive as a form of web Theory. Strategies for Qualitative Research. pragmatics.4 Sociology Press, 1967. We are also testing the generality of the approach outside [8] S. Handschuh and S. Staab. Authoring and scholarly discourse, exploring the use of recommendation Annotation of Web Pages in CREAM. In Proc. filters and discourse links in the Laboranova project5 which WWW2002: 11th Int. World Wide Web Conference, is focussing on the early stages of innovation when ideas are 2002. developed, debated, improved and evaluated. We are [9] S. Hitchcock, L. Carr, Z. Jiao, D. Bergmark, W. Hall, exploring the possibilities of introducing stimulus agents and C. Lagoze, and S. Harnad. Developing Services for serious games to strengthen proposals for innovation Open Eprint Archives: Globalisation, Integration and development by suggesting argumentative connections the Impact of Links. In Proceedings of the 5th Int. between ideas, supporting examples, diagnostic tools outputs Conference on Digital Libraries. ACM, 2000. or relevant experts. [10] C. M. Hoadley and M. C. Linn. Teaching Science Through Online, Peer Discussions: SpeakEasy. In 9 ACKNOWLEDGMENTS The Knowledge January 2005. We are grateful to the reviewers for their helpful feedback. [11] J. Kupiec, J. Pedersen, and F. Chen. A Trainable This research was supported by the Advanced Knowledge Document Summarizer. In Proceedings of the ACM SIGIR’95 Conference, pages 68–73. ACM, 1995. [12] S. Lawrence, C. L. Giles, and K. Bollacker. Digital 3 http://kmi.open.ac.uk/projects/hyperdiscourse Libraries and Autonomous Citation Indexing. IEEE 4 http://www.pragmaticweb.info Computer, 32(6): 67–71, 1999. 5 http://www.laboranova.com [13] C. Mancini and S. Buckingham Shum. Modelling ACM. Discourse in Contested Domains: a Semiotic and [20] F. M. Shipman and R. McCall. Supporting Cognitive Framework. International Journal of Knowledge Base Evolution with Incremental Human Computer Studies, 64(11): 1154–1171, 2006. Formalization. In Proceedings of the SIGCHI [14] C. C. Marshall. Annotation: from Paper Books to the Conference on Human Factors in Computing Digital Library. In Proceedings of the 2nd ACM Systems, pages 285–291. ACM, April 1994. International Conference on Digital Libraries, pages [21] J. M. Swales. Genre Analysis: English in Academic 131–140, Philadelphia, PA, USA, 1997. ACM. and Research Settings. Cambridge University Press, [15] H. Nanba and M. Okumura. Towards Multi-Paper 1990. Summarization using Reference Information. In [22] S. Teufel and M. Moens. Summarizing Scientific Proceedings of the IJCAI’99 Conference, pages 926– Articles: Experiments with Relevance and Rhetorical 931, 1999. Status. Computational Linguistics, 28(4):409–445, [16] I. Ovsiannikov, M. A. Arbib, and T. H. McNeill. December 2002. Annotation Technology. International Journal of [23] V. Uren, S. Buckingham Shum, G. Li, and M. Human Computer Studies, 50(4): 329–362, 1999. Bachler. Sensemaking Tools for Understanding [17] G. Salton, A. Singhal, C. Buckley, and M. Mitra. Research Literatures: Design, Implementation and Automatic Text Decomposition Using Text Segments User Evaluation. International Journal of Human and Text Themes. In UK Conference on Hypertext, Computer Studies, 64(5):420–445, 2006. pages 53–65, 1996. [24] V. Uren, Buckingham Shum, S., Li, G., Domingue, J. [18] B. Sereno. A Document-Centric Semantic and Motta, E. Scholarly Publishing and Argument in Annotation Environment to Support Sense-Making. Hyperspace. Proc. WWW2003: 12th Int. World Wide PhD thesis (also available as Technical Report KMI- Web Conference, May 20-24, 2003, Budapest. 06-13), Knowledge Media Institute, The Open [25] K. Weick. Sensemaking in Organizations. 1995, University, Milton Keynes, UK, September 2005. Thousand Oaks, CA: Sage Publications. [19] B. Sereno, S. Buckingham Shum, and E. Motta. [26] M. Weinstock. Citation Indexes. In Encyclopedia of ClaimSpotter: an Environment to Support Library and Information Science, volume 5, pages Sensemaking with Knowledge Triples. Proc. Int. 16–40. 1971. Conf. Intelligent User Interfaces, pages 199–206,