Formalization, User Strategy and Interaction Design:
       Users’ Behaviour with Discourse Tagging Semantics
                                                1                                   2                       2
                       Bertrand Sereno, Simon Buckingham Shum and Enrico Motta
  Centre for Advanced Learning Technologies, INSEAD                     Knowledge Media Institute, The Open University
    Boulevard de Constance, F-77305 Fontainebleau                                Walton Hall, Milton Keynes
                        France                                                               UK
             bertrand.sereno@insead.edu                                  sbs@acm.org / e.motta@open.ac.uk

ABSTRACT
When authors publish their interpretations of the ideas,
                                                                    1      INTRODUCTION
opinions, claims or rebuttals in the literature, they are           Our communities, local, national and international, are
drawing on a repertoire of well understood moves,                   confronted by problems that are complex due to the
contributing to an extended discourse. Readers also bring           changing      environment, incomplete or ambiguous
their own perspective to documents, interpreting them in the        information, and stakeholders with different perspectives.
light of their own research interests, and initiating, for          Such domains include strategic planning in business,
instance, new connections that may not have been intended           government policy formulation, time-pressured mission
by authors. Collaborative, social, tagging holds promise as         operations, and almost all scholarly research. The
an approach to mediating these processes via the Web, but           sensemaking activity that these contexts demand [25],
may lack the discourse dimension that is fundamental to the         requires analysts to construct plausible narratives that frame
articulation of interpretations. We therefore take a hybrid         the problem, account for the available evidence, and
semiformal approach to add structure to freeform                    motivate action, enabling “an openly reflexive forum in
folksonomies.                                                       which communities of knowing explicitly talk about their
Our experience demonstrates that this particular brand of           understandings” [2]. Progress is made by making moves that
tagging requires tools designed specifically for this               express and contest interpretations of the world, although
sensemaking task by providing enough support to initiate the        these different contexts clearly have very different genres of
annotation, while not overwhelming users with suggestions.          discourse and criteria for acceptance. The focus of this paper
We describe a tool called ClaimSpotter that aims at                 is on the work of academic researchers, but we argue that
supporting this tradeoff, through a novel combination of            related work shows that an approach grounded in discourse
system-initiated tag recommendations, Web interface                 relations is applicable to a broader range of applications.
design, and an expanded conception of how tags can be both              It is established from corpus analyses that when
expressed, and semantically linked. We then report a                researchers publish their interpretations of the ideas,
detailed study which analysed the tool’s usability and the tag      opinions, claims or rebuttals in the literature, they are
structures created, contributing to our understanding of the        drawing on a repertoire of well understood moves,
implications of adding structure to collaborative tagging.          contributing to an extended discourse [21]. Although the
                                                                    Internet is accelerating the pace of exchanges, scholarly and
                                                                    scientific discourse still proceeds in the shadow of the
Categories and Subject Descriptors                                  printing press, with exchanges now disseminated as digital
H.1.2 [Information Systems]: User/Machine Systems—                  prose. While information retrieval and text analysis
Human information processing; H.4.m [Information                    technologies help to infer certain kinds of structure within
Systems]: Miscellaneous                                             and between papers, our research is complementary,
                                                                    exploring a ‘network-native’ paradigm in which the key
Keywords                                                            claims made by an author (and the interpretations made by
                                                                    their readers) are published as explicit new connections to
Social tagging, Sensemaking, Discourse Relations,
                                                                    the literature. The research question driving our research is:
Semantics, Argumentation, Usability, Pragmatic Web
                                                                    can we model the discourse structures we find in research
                                                                    communities as explicit structures, and if so, what support


Work conducted while the first author was at The Open University.
Copyright is held by the author/owner(s).
WWW2007, May 8-12, 2007, Banff, Canada.
tools can we provide to construct, navigate, and interrogate       1.    As with folksonomies, tags remain unconstrained
such structures? Such approaches to knowledge publishing                 freetext strings, although users can choose to take care
and negotiation on the Web will ideally be both quickly                  to reuse existing tags in order to increase the visibility
learnable, yet sufficiently expressive to permit researchers to          of their tagging, or to discover new connections. In our
make important scholarly moves, and assist them in making                context, however, tags may become phrases or even a
sense of the emergent structures at scale.                               sentence or two if they are used to express, for instance,
    What is especially interesting about scholarly discourse             a hypothesis, a prediction or a research result.
is the fact that “the truth” or “the significance” of any claim    2.    A critical difference is that tags may be linked not just
is open to contest. While this may be extreme in the case of             to a URI, but to each other. We term a tag—
philosophy and the humanities, it is self-evidently also the             relationship—tag triple a claim, that is, a meaningful
case even in computing and the hard sciences. There is no                connection being asserted between two ideas. A claim
single reading of a paper; interpretations may differ                    may also link from/to other claims, as the ideas grow in
significantly between readers and authors (hence the need                complexity. A claim is also directed: it has a source and
for peer review); readers bring their own unique perspective             a destination tag.
to a paper, seeing new connections that the author may never       3.    Tags are linked using a typology derived from
have intended. Seeing old things in new ways is the essence              argumentation and the most common moves made in
of creativity. This is the orientation we bring to harnessing            research publications. Users select the relationship from
the power of social tagging, but augmented with discourse                a menu of predefined relationships (e.g. is consistent
semantics, as we strive to create effective infrastructure for           with, refutes, addresses, solves, improves on, is
scholars to express—and contest—claims to knowledge.                     analogous to, uses/applies).
                                                                   4.    Tags may optionally be classified (e.g. problem,
2        SCHOLARLY TAGGING                                               evidence, data, method, theory), but these are
                                                                         pragmatic, contextual roles, holding only in the context
                                                                         of a particular claim. Thus, in one context, a research
2.1      Questions no search engine can answer                           result might be a problem, while, in another context, it
Consider the following questions that interest students and              might be an assumption.
researchers, but which neither Internet search engines nor
domain-specific digital libraries can assist in answering:         2.3     Relation to our previous work
What data refutes this hypothesis? Are there different             Elsewhere [4, 24], we have demonstrated how a digital
schools of thought in this field? Is there an analogy between      library can be tagged in this way with annotation tools, the
this process in fields X and Y? Why does this paper cite that      resulting network navigated via interactive visualizations,
one? How did these contrasting perspectives interpret this         and the semantic searches enabled by modelling discourse
result? The answers to these questions are grounded in the         relations (e.g. show papers that support, or contrast in some
discourse moves that researchers make in their writing: the        way with this paper; show the lineage or ancestors of the
arguments, rhetoric and positioning of their claims with           idea represented by this tag). We have also evaluated how
respect to the literature. In our present infrastructures, these   students make use of some of the tools to navigate and
are questions that can only be answered by reading the             search prepopulated networks modelling a literature [4].2
paper, although there is active research on the automated              Having demonstrated the potential of scholarly
analysis of argumentative relationships between papers [22].       publishing and annotation using discourse relations to
    These are fundamentally issues of interpretation, which        annotate texts, the challenge (as for any structured
fall outside ontology-based Semantic Web approaches which          knowledge capture tool) was can users do this? To date, we
model stable, consensus, ‘objective’ worlds (albeit always         have not presented data on tag authoring behaviour. This
from a perspective). Nor can they be answered by                   paper reports the first quantitative and qualitative analysis of
scientometrics (e.g. citation analysis) which do not have          the ways that novices and experts approached semantic
enough insight into the nature of the moves being made. We         tagging in their first encounter with a software tool.
are now squarely in the realm of pragmatics, where meaning         Semantic tagging behaviour is inextricably linked to (1) the
derives from interpretation, perspective, contextualisation        semantic scheme, as introduced above, and (2) the user
and argumentation—in other words, the construction of              interface and functionality of the tagging tool, introduced
plausible narrative, as introduced at the start.                   next.
2.2      Discourse semantics for annotating claims                 2.4     ClaimSpotter
We take a hybrid, semiformal approach to add structure to          The previous annotation tools developed to support our
freeform folksonomies. Details can be found in [4, 13, 24] 1       tagging approach did not allow direct annotation of the
                                                                   target document. This was an explicit goal with
1 The work of ISO/TC37/SC4 shares a common interest in
    discourse and coherence relations: http://www.tc37sc4.org      2 Demonstrations and screencasts: http://claimaker.open.ac.uk
    Figure 1: The ClaimSpotter interface. Key: [1] The My/All Tags toolbar button highlights text matching tags on this
    paper (just the user’s, or all tags), e.g. [2] measures of trust in the content of Web resources. Clicking a highlighted tag
    enters it in the tag linking form. Similarly, the Relations filter [3] optionally applied to the whole document [4], highlights
    verbs matching, or synonymous with, the link types, e.g. [5] the verb describes in the text has been matched to the tag
    relation is about. Tag triples can be built from existing tags using the [X,.,.] and [.,.,X] buttons [6] to specify the left and
    right sides of the triple. The tag link is selected from the menu [7]. Notes can be saved as tags [8], which like the
    document text, can on request be parsed for matching tags and relational types.
    Further examples and screencasts: http://kmi.open.ac.uk/projects/hyperdiscourse/tools/claimspotter
ClaimSpotter, designed to support document sensemaking                recommendation agents), or characterise the impact of a tool
tasks: reading, highlighting areas of potential interest,             on the practices of researchers (which one might do with
making notes, looking for specific kinds of papers in the             mature tools like Google, Wikipedia or del.icio.us). We
bibliography, and so forth. While researchers clearly do this         conducted a formative evaluation of a new prototype, in
all the time on paper, or with freetext annotations in various        order to develop a language in which to describe as yet
document viewers, the challenge was to support users in               poorly understood phenomena. Our specific objective was to
these tasks with our semantic tagging approach.                       characterise how annotators approached the task we gave
     ClaimSpotter’s design aims to initiate and sustain a             them with an unfamiliar tool, paying particular attention to
dialogue between annotators and the target document, via              how the affordances of the user interface (that is, the visual
(i) content-based support for tagging, in the form of                 cues it provided for interaction) shaped tagging behaviour,
recommendations, and (ii) an interface displaying these               summarised quantitatively against various measures, and
recommendations overlaid on the text (cf. figure 1). Details          explained through qualitative coding of the data.
are in [18, 19]; we turn now to the evaluation study.
                                                                      3.1     Methodology
3      USER EVALUATION STUDY                                          We recruited 13 annotators (referred to as a1–a13) who used
                                                                      ClaimSpotter to annotate a 2 page research paper which they
There are many types of evaluation. A summative analysis
                                                                      had preferably authored, or were at least very familiar with
could    evaluate  technical    performance   (e.g.   of
                                                                      to avoid any comprehension problems. Ten users were PhD
students, two were research fellows and the last was a            The examples below demonstrate the variety of tag triples
professor. None had used ClaimSpotter before. Four of them        created by participants:
(a1–a4: 1 student, 2 RFs and the professor) were considered
‘experts’ with the tag linking scheme, being members of the          [Domain ontology, is about, A hierarchy of URIs on
project team. The remaining nine (a5–a13) were considered            multiple levels]
‘beginners.’
    Each session was limited to one hour. Screen interactions        [Universal physical access, is unlikely to affect, Digital
were recorded with a capture tool, and all comments and              divide]
discussions recorded, resulting in high quality audio-visual
data as digital movies. A tutor (first author) bootstrapped          [Hypertext node juxtaposition. is analogous to,
each annotation process by defining a few tags for each              Cinematic shot juxtaposition]
document. He was also present throughout the session to
provide assistance when needed, but also to engage                   [(Evidence) In the Bristol trial, the awareness of the
discussion when suggestions were made. A questionnaire               presence of other players was correlated with how much
sent one week after the experiment was designed to elicit            our participants enjoyed the game as well as with how
opinions on the main strengths and weaknesses of the                 engaged they felt, is consistent with , Presence
interface, and on the ways it could be improved. See [18] for        awareness of many other people is capable of causing,
more details.                                                        feel good factor]

                                                                     [Magpie moves away from hypermedia towards open
4     QUANTITATIVE ANALYSIS                                          service-based architectures, is evidence for, [Magpie,
257 tags and 160 claims were submitted, giving on average            improves on, COHSE] ]
19.8 tags and 12.3 claims per annotator, with no major
difference between the 4 experts and 9 beginners, the former          It can be seen that tags ranged from single words to a
entering marginally more tags (a mean of 20.75 against            sentence, are optionally given a type (cf. fourth example.) In
19.3) and links (a mean of 14.75 against 11.2) than the latter.   the last example, a tag is linked to another triple to create a
                                                                  compound claim.
4.1    Tags
                                                                  4.3    The is about link
Most tags submitted were 1–3 words. 164 out of 257 tags
(64%) submitted were ≤3 words. Short tags (representing           If we consider conventional tagging on the Web, the
proper nouns, acronyms or projects names) were as                 assignment of a tag to a URI is semantically very close to
frequently submitted by novices as by experts. Most of these      simply asserting that the content is about that tag. We
tags were used twice, while a handful were used three times.      performed a detailed evaluation of the use of the is about
Duplicated tags were either created ‘explicitly’ by reusing a     link, since it was one of the most commonly used links. It is
tag previously created in the current document, or                what we might term a ‘less committing’ link compared to
‘implicitly’ by typing a text string which happened to be         stronger, more argumentative relations such as challenges,
already used as a tag. However, the documents chosen by           proves, or is analogous to. This of course does not mean that
participants were so different that duplicates were mostly        is about links have little value: they have as much value as
due to annotators reusing a tag created beforehand by the         current tagging practices, and when used between two tags,
tutor. We also noticed that reused tags were not necessarily      such a connection can express a valuable and surprising
composed of short tags only: some longer tags were reused.        stance if they were previously thought unrelated.
                                                                      Experts submitted proportionally fewer is about links
4.2    Tag triples (“claims”)                                     than beginners, which we attribute to their greater awareness
                                                                  of the other links available. Beginners, by contrast, were
22 relation types (out of the 36 available) were used. 7 out of   more likely to use is about as a placeholder ‘catch all’ link,
these 22 were used only once or twice. ‘General’ relations        especially when they had not yet established if the link they
were the most frequently used ones, but it is difficult to talk   had in mind was on the menu (see the user strategy ‘Starting
about these most frequently used relations, as the papers         from the tags’ discussed shortly.)
considered were different. A more interesting aspect may be           Those annotators who made more links made
to identify which relations were the most consistently used       proportionately more is about links. In contrast, annotators
by annotators.                                                    who made fewer links made almost no use at all of them. It
    The relations uses/applies/is enabled by and is about         appears that they focused directly on forging stronger links.
were the two most consistently used: only 3 annotators did
not use the former at all, and only 4 did not use the latter.
    If we divide each annotator’s total link set in half, we           Formalization
find more is about links in the first half than in the second              creating a tag
half. We interpret this as confirming the idea that this lower                 choosing a tag type
commitment link helped to scaffold users into this new                             appropriate tag type
mode of tagging. 8 annotators out of 13 had submitted at                           not perfect tag type but problem with
least one is about link. As they became more knowledgeable                         or lack of a tag type
about the process and the links available, there seemed to be                      cannot find a tag type
less need to fall back on is about. It can therefore be seen as                removes tag type
a mechanism to incrementally formalize [20] one’s tagging.                     deletes tag
We can imagine ClaimSpotter prompting annotators at a                      creating a claim
later stage to review whether to ‘upgrade’ is about links to                   choosing a relation
more specific ones.                                                            removing a claim
                                                                               ...
5     QUALITATIVE ANALYSIS                                                 discussion about formalism
                                                                       User Strategy
The qualitative analysis focused on the audio-video data. We               keeping things simple
used a shallow Grounded Theory methodology to code the                     reducing amount of information on screen
video transcripts (to create concepts) and organise them (in                   looking for ideas
order to draw relations between these concepts) [7]. The                       focussing on a particular area
outcome of this methodology was (in Grounded Theory                            hiding an area
terms) a ‘theory’, that is, a set of plausible relationships                   ...
holding among multiple concepts. Concepts emerged from                     starting a claim from the tags
the analysis and were constantly compared against each                     starting a claim from the relation
other through specialization of codes into sub-codes, or vice-             typing or selecting a tag
versa, consolidating sub-codes into parents (called                        incremental formalization
categories). Finally, a stable state (the point of theoretical                 reusing a tag or a claim previously submitted
saturation) was reached where the codes were judged to                         ...
account for the salient phenomena. The final taxonomy is               Interaction Design
given in Table 1, providing a more nuanced vocabulary than                 consistency
available prior to the study, in which to describe users’                  feedback
tagging behaviour with ClaimSpotter. Discussion is                         ...
organised around the three top level themes: Formalization,            Miscellaneous
User Strategy and Interaction Design.
                                                                   Table 1: Extract of the data coding scheme which
5.1    Theme 1: Formalization                                      emerged from the analysis of tagging behaviour:
                                                                   themes, categories, sub-categories and codes
The analysis of behaviours grouped under formalization
yield insights into the degree of cognitive effort it took users   will willingly add tag complexity as it serves their
to use the new structured tagging scheme.                          anticipated needs.
Assigning types to tags                                            Relation types
Most users decided not to add a tag type simply because it         An appropriate relation was found in 115 occurrences, out of
was optional: types were assigned 34 times, out of a total of      160 total. However, just as we found with choosing a tag
257 tags. Twice, types were explicitly not assigned because        type, we observed difficulties in choosing a relation type:
there were too many (“The interesting thing is that this           • On 8 occasions, a ‘good enough’ relation was found.
specific example (tag) could fall in different categories.”)            This means that the annotator kept and submitted the
and once, because there were not enough (“It’s not a                    triple, although it did not express completely what she
problem, it’s not a solution, and it’s not a methodology. I’d           had in mind (“I can say is similar to, since there is
like something that says research field”). Search was not part          nothing else better than that”);
of this evaluation task (the focus of a previous experiment        • On 6 occasions, the problem was even more acute: “The
[23]. We have not yet gathered longitudinal data with                   relation (that I want) is not there. So what do we do?” It
extensive tag authoring and searching, but we hypothesise               resulted in the removal of the whole triple that was
that as users learn that they can search on types (e.g. find all        being created.
instances where this tag was considered an assumption),                Multiple attempts were sometimes needed to get a claim
they might start to assign them in anticipation. This is           right. This implied either trying different relations and
analogous to expert users formulating compound                     finding out which one looked (and, actually, sounded, as
specializations of tags in Web social bookmarking. Users           annotators were saying them aloud very often) best, flipping
the source and destination tags, or reformulating a tag to       They could also be activated to discover (and reuse) existing
make it suit a given relation. We recorded 11 incidents when     tags, to position an argument with respect to peers’ tags, to
an annotator had to reformulate the wording of a tag because     find out how a particular tag was used over the corpus, to
of a relation.                                                   find peers’ tags and claims, to indicate which tags were
                                                                 associated to a cited document, or to indicate how a cited
‘Good’ and ‘bad’ tags                                            document was assessed by its author.
One annotator commented that a tag she was considering           Incremental formalization
adding was “a silly tag” but that she would “make it
anyway”, because it was of interest to her. She then added:      Tags and claims were not necessarily submitted
“I’m not sure if that tag’s going to be good. Maybe some of      immediately. Instead, they were often kept on the screen
these tags are less useful than the others.” Prompted to         because annotators felt the need to see them to facilitate the
comment on her notion of tag utility, her answer was most        creation of claims. Saying aloud the relations was also a
interesting: “A good tag will be something that is consistent,   phenomenon we often noticed, as mentioned earlier.
something that would appear again and again in the                   Another strategy-related phenomenon was related to the
document. [Tag name] is a good tag for instance, compared        order in which annotators accessed the different resources at
to something I would use only once.” This notion of quality      their disposal. They seemed to focus first on making their
derived from potential reusability, which is clearly the         own annotations (possibly to get their feet wet with the
conventional understanding that users bring to tagging. This     formalism) before browsing through the history and looking
puts a premium on short tags referencing real world entities,    for relevant tags and claims from their peers. This may have
such as the names of theories, algorithms, problems or           been an experimental artefact (the need to ‘get something
methods. These are, of course, the sorts of entities that are    done’ by the hour)—“For the time given, the easiest thing is
extractable automatically, compared to the more complex          to see the system suggestions and make your own. Because
tags that ClaimSpotter supports, but which were more novel       go back and look through the history may just take too much
to users and were used less frequently. In devising an           time”. But it may also have to do with a desire to appropriate
interface for more subjective interpretative tags, this          the document first, to make it their own, before turning to
comment gave us pause for reflection on how the interface        what their peers said about it.
could have encouraged richer tags, to move users beyond the
stereotype. See also the later discussion on the bias we         Starting from a relation vs. starting from tags
unwittingly gave in the user interface to short, matched tags,   We also observed a striking difference between how
which reinforced this emphasis.                                  (mostly) experts started from the relation type they wanted
5.2    Theme 2: User Strategy                                    to use for a claim and how (mostly) beginners started from
                                                                 the two tags they wanted to put in relation, without knowing
Users are hard to predict, each brings his/her own unique        if the relation type they wanted to use existed. On reflection,
knowledge of their domain, and varying expectations about        this phenomenon is not surprising, but this was the first
the formalism and tool. Although we might have expected as       empirical evidence we had.
many strategies as we had annotators, we believe we have
identified several patterns.                                     Towards a new kind of annotation process?
Roles played by recommendations                                  Although a few of the claim-spotting filters did exhibit some
                                                                 unwanted results on the papers provided by the annotators,
We noted a difference in the amount of support annotators        the visual noise levels were not as damaging as had been
wanted from the interface, and its ability to extract and        feared. What was of interest to us was whether the very
‘recommend’ elements through text highlighting. Beginner         presence of recommendations and peers’ tags shaped
annotator a7 made little use of the recommendations and          annotators’ behaviour.
spent most of the experiment inputting her own tags and              We characterise the effect that highlighted tags had as
claims, while all the other participants did actually use the    follows. From a situation in which annotators are given no
suggestions.                                                     cues as to how to tag a document, we moved to a situation in
    Expert annotator a1 preferred at one point to deactivate     which they had to decide if an existing tag was good
the suggestions because, in her words, “I don’t want to be       material to make a tag or a claim or not. We felt that
too distracted by having too many things going on. At the        annotation moved towards making a Yes/No decision in
moment, it seems to be quite complicated. I’d rather keep it     response to each recommendation. We will revisit this point
simple.” Later, however, she made use of the                     later.
recommendations “to see if there's anything inspirational (in
this part of the document)”.                                     5.3    Theme 3: Interaction Design
    Recommendations were typically used to reduce the            We studied in detail the annotators’ interactions with the
document to a set of potentially interesting focal fragments.    interface, and concluded that the environment was
reasonably intuitive within the constraints of the recorded
task of annotating a single document. Longitudinal                 Annotating and checking for visual feedback
evaluation with large tag sets will undoubtedly reveal other       This move illustrates the dominance of ‘visible’ tags, which
design weaknesses.                                                 as discussed, had not been foreseen. Users would select,
Successful features                                                copy and paste some text from the document into a tag,
                                                                   submit it, and immediately activate the ‘my tags’ filter to see
The presence of pull-down menus of tag types and relations         it appear highlighted in the text, confirming that it had been
on the screen succeeded as a visual scaffold: “I’m looking         recorded.
through the types because I’m not familiar with them.” The
presence of the multiple tag types available also drew             6     DESIGN WEAKNESSES
annotators’ attention to specific aspects of the paper that
they might choose to focus on, e.g. what is the problem            In this section, we reflect on some of ClaimSpotter’s design
tackled, or the methodology proposed?                              weaknesses, and consider improvements that may also be of
    The tag-linking features were also very successful,            relevance to other collaborative knowledge structuring tools.
encouraging a playful approach: the act of combining and
swapping tags between the left and right sides of the link
                                                                   6.1    Information overload?
was made easier by not having to retype them, and                  ClaimSpotter’s filters were designed to address the
introduced     a bricolage aspect that encouraged                  challenge of supporting an annotator in the task of locating
experimentation.                                                   and tagging a document’s contributions. The presence of
    As users gain confidence with a tool, they develop             highlighted tags and text fragments undoubtedly shaped the
interaction routines, that is, compilations of micro-actions.      annotation process, and we have evidence that annotators
These routines provide us with another way to describe the         valued seeing these, with some variations in when they
coupling between user interface and structured tagging.            activated them. We did find evidence, however, that there
                                                                   may have been too much information. As mentioned by a4,
Navigating and tagging by document section                         “the problem is, do you make your own claims, do you
A simple routine was navigating via the contents menu to a         follow the system, do you go back to the history to see what
particular section, reading/skimming it and summarising it         the other people have said?”
via a tag. This enabled a user to work through the text                There is no question that for a one-hour experiment,
systematically, and confirmed the value of integrating the         there was indeed a lot of information to understand and
document and the annotation in a seamless interface.               digest. More studies are needed to introduce the different
                                                                   sources of support more gradually, and to let annotators
Navigating and tagging by recommendation                           decide which ones work best for them. Better ways to
                                                                   organise these recommendations need to be devised (work
A variation on this was to work from the output of filters:
                                                                   has begun on a dialogue assistant that helps annotators ask
switching on a filter, looking at a highlighted area in the
                                                                   themselves focused questions about the document, and
document, reflecting on it, modelling a tag or a claim, and
                                                                   which suggests recommendations for each question).
moving to the next highlighted area. This sequence again
confirmed the ability to move fluidly between engaging with        6.2    ‘Current-document centeredness’
the document, and tagging, with highlighted tags in the text
acting as attention-catchers.                                      New users will focus on what they are offered by the
                                                                   display.     ClaimSpotter’s      document-centric     design
Combining tags into claims                                         emphasised the current document, at the expense of easy
                                                                   access to cited documents, for instance. Our conclusion is
The process of claim authoring evolved into a recognisable
                                                                   that this resulted in a limited number of claims being made
pattern of creating a tag, creating another tag, combining
                                                                   which connected tags originating in different documents.
them in a claim, looking for a discourse link, not finding
                                                                   However, our other work has evaluated user interfaces that
one, flipping the order of the tags in the relation, and finally
                                                                   foreground the tag space structure, providing a
finding an appropriate relation.
                                                                   complementary perspective [23].
Reusing and adapting peers’ tags
                                                                   6.3    User ‘laziness’
Some users learnt to use the less obviously available history
                                                                   Our objective was to devise a more active interface to
window (listing, among others, non-matched tags.)
                                                                   suggest possible tags. We now play devil’s advocate and ask
Consulting the tags available and reusing one or more in
                                                                   if tags and claims would not be more reflective if they had to
one's own tag space demonstrated that annotators did benefit
                                                                   be devised manually by the annotator? By saving the
from peers’ tags.
                                                                   annotator the cognitive effort of formulating their own tags,
                                                                   are we undermining the very process we want to promote?
    We observed a tendency to create (i.e. reuse) tags from       tools that forge a link between argumentation and current
text fragments highlighted in the document by the                 Web annotation tools and practices [11].
recommendation filters. Some of these were copied and                  Our work builds on research into readers’ annotation
edited to taste, but they were nevertheless heavily inspired      practices, in which annotation is a means to record personal
by the highlighted elements in the original document. While       ideas and interpretations, including connections to additional
this seems to be a ‘good’ thing both in terms of usability (it    scholarly documents, reformulations of the authors
lowers the barrier for constructing semantic literature           arguments, assessment of its significance or ‘warning’
models), and in terms of the building of a network                signals to indicate key passages [14]. However, we are
promoting the reuse of tags, there is the corresponding risk      exploring the representational and interactional requirements
that less effort is put into the annotation: the user comes to    for tools to enable these personal perspectives to be made
expect the system to bring her the salient facts about a          public as a semiformal network that can be managed,
document (whether these are composed of important                 extended, and contested. Current annotation tools [16]
sentences, or matched existing tags.)                             provide no support to manage what might be thought of as
    While this may represent a new paradigm for scanning          large scale annotations on annotations.
and tagging documents, we are also cautious about the                  Ontology-based annotation tools are being developed as
implications. Lazy annotators may be tempted to accept            an essential part of the Semantic Web movement. However,
them without critically assessing them, resulting in the          these applications may in fact be better characterised as the
propagation of poor tags. Within an educational context, one      supporting the ‘translation’ of information in the document
possibility would be to keep tag suggestions and automatic        into ontological entities. Although there may be debate
text highlighting at an imperfect level, to maintain students’    about how to map an entity into an ontology, the material
vigilance.                                                        itself is not normally the focus of contention (such as the
                                                                  names of people, events, locations, processes). The tools
6.4    Interface bias towards ‘matched’ tags                      certainly do not aim to support debate about the significance
Let us now consider the ‘matched tags’ recommender.               or meaning of an entity in a document.
Matched tags (exact matches, as with most social tagging               Our use of recommendation filters derives from work on
tools) were privileged in the user interface over non-matched     the summarisation of scientific papers. Potentially relevant
ones: the former were visible via the activation of a filter      passages can be delimited with multiple approaches, based
and highlighted in bright yellow zones directly in context        on (i) the structure of the (scholarly) document [1]
within the document, while the latter were ‘hidden’ in the        (ii) surface-based features [11] (iii) topical coherence [17]
separate history window. Matched tag highlighting gives           and (iv) rhetorical coherence measures [22]. Other work on
immediate feedback to annotators, and the satisfaction of         literature-wide analysis on which we could draw includes
seeing one’s tags highlighted on the text is akin to that         identification of relevant documents by analysing their
gained in social bookmarking when one’s tagged pages              citations sections [9, 12]. Pivotal points can also be proposed
show up with the rest of the world’s.                             to filter a network of documents and retain only the most
    However, we again raise the question of the quality of        important ones [5]. Nanba et al. [15] also propose an
tagging, whereby the emphasis could shift from reflectively       approach to both identify reference areas and the role [26]
submitting new tags, to submitting ‘visible’ tags (that is,       played by these areas. They consider the following roles:
matched by the dedicated recommender). Better presentation        references indicating other researchers’ theories or methods
options must be devised, including a mechanism to display         used as a basis, references to related works to mention a
‘non-matched’ tags in the main window. Although this has          contrast or a problem and other references.
not been verified, it may be that the user interface design led        Since researchers clearly need to annotate domain
annotators to forget that there might be other tags: it           terminology, Semantic Web annotation tools are part of the
certainly did not actively remind them. This may have led         solution. In CREAM [8], an annotation by mark-up mode is
them to submit more ‘copied-and-pasted’ tags. This added          provided, enabling the user to select any piece of relevant
focus on the visual salience of highlighted text spans may        information from the page and drag and drop it to create or
also mean that matched tags became a way to cover the             instantiate the selected concept instance (researcher name,
document with tags. By doing this, annotators received            address…) Text fragments are extracted from the page to
implicit feedback that they had read the document.                foster a semi-automatic annotation: the knowledge expert
                                                                  agent only has to validate the extracted elements.
                                                                       However, following the social tagging paradigm,
7     RELATED WORK                                                annotators in our approach will tag only those elements in a
Our work is one strand in research on computational               text that reflect their interests (there is no gold standard set
modelling of argumentation (e.g. COMMA [6]), but while            of tags that can be automatically extracted, since there is no
other work focuses on the formalization of human or agent         single, authoritative meaning). As we have argued on
argument structures and processes, we place more emphasis         theoretical grounds elsewhere, the representational
on interaction design, and on the development of software         requirements for modelling discourse are different [13]. This
work is therefore better framed not so much as Semantic            Technologies (AKT) project, an Interdisciplinary Research
Web (controlling interpretation through consensus domain           Collaboration (IRC) sponsored by the UK Engineering and
models) than as Pragmatic Web (foregrounding context,              Physical Sciences Research Council (GR/N15764/01). The
argument, interpretation and perspective) [3].                     AKT IRC comprised the Universities of Aberdeen,
                                                                   Edinburgh, Sheffield, Southampton and The Open
8     CONCLUSIONS AND FUTURE WORK                                  University.

We offer this analysis as an example of a human-centred            10 REFERENCES
design process for collaborative knowledge structuring
environments. We hope that the particular approach we are          [1]    P. Bishop. Digital Libraries and Knowledge
developing contributes to wider efforts to add greater                    Disaggregation: the Use of Journal Article
representational expressiveness to social tagging, without in             Components. In Proceedings of the 3rd International
the process straitjacketing it.                                           Conference on Digital Libraries. ACM, 1998.
     Social bookmarking via freeform ‘folksonomic’ tagging         [2]    R. J. Boland and R. V. Tenkasi. Perspective Making
is demonstrating its huge potential for collective indexing of            and Perspective Taking in Communities of Knowing.
materials through emergent vocabularies. In our approach,                 Organization Science, 6: 350–372, July 1995.
we have preserved the freedom that folksonomic tagging             [3]    S. Buckingham Shum. Sensemaking on the Pragmatic
permits in what counts as a ‘tag’, added the option to                    Web: a Hypermedia Discourse Perspective. In
classify tags, and introduced the option to link tags using               Proceedings of the 1st International Conference on
familiar ‘research moves’, but predefined in order to                     the Pragmatic Web. GI Lecture Notes in Informatics,
leverage automated filtering and search. The ClaimSpotter                 September 2006.
prototype supports the collaborative annotation of
                                                                   [4]    S. Buckingham Shum, V. Uren, G. Li, B. Sereno, and
documents using this representational scheme. We have
                                                                          C. Mancini. Computational Modelling of Naturalistic
summarised a detailed analysis of how annotators made use
                                                                          Argumentation        in      Research      Literatures:
of the tool in their first hour of usage, describing the results
                                                                          Representation and Interaction Design Issues.
under the themes of Formalization, User Strategy and
                                                                          International Journal of Intelligent Systems, 22(1):
Interaction Design.
                                                                          17–47, 2006.
     This work is being developed in several directions. There
is clearly scope to improve the interface design, and to add       [5]    C. Chen. The Centrality of Pivotal Points in the
the kinds of flexibility that we see in social tagging                    Evolution of Scientific Networks. In Proc. Int. Conf.
interfaces such as recording tags as private, personalising               Intelligent User Interfaces, pages 98–105, ACM.
recommendation filters, and enabling richer user profiles.         [6]    COMMA: 1st Int. Conf. on Computational Modelling
ClaimSpotter is one of a suite of tools being developed in                of Argumentation, (Sept.’06), Liverpool, UK, IOS
the Hypermedia Discourse project3 in which we are now                     Press
developing a server to provide coherence relations-based           [7]    B. G. Glaser and A. Strauss. Discovery of Grounded
tagging services, which we conceive as a form of web                      Theory. Strategies for Qualitative Research.
pragmatics.4                                                              Sociology Press, 1967.
     We are also testing the generality of the approach outside    [8]    S. Handschuh and S. Staab. Authoring and
scholarly discourse, exploring the use of recommendation                  Annotation of Web Pages in CREAM. In Proc.
filters and discourse links in the Laboranova project5 which              WWW2002: 11th Int. World Wide Web Conference,
is focussing on the early stages of innovation when ideas are             2002.
developed, debated, improved and evaluated. We are                 [9]    S. Hitchcock, L. Carr, Z. Jiao, D. Bergmark, W. Hall,
exploring the possibilities of introducing stimulus agents and            C. Lagoze, and S. Harnad. Developing Services for
serious games to strengthen proposals for innovation                      Open Eprint Archives: Globalisation, Integration and
development by suggesting argumentative connections                       the Impact of Links. In Proceedings of the 5th Int.
between ideas, supporting examples, diagnostic tools outputs              Conference on Digital Libraries. ACM, 2000.
or relevant experts.                                               [10]   C. M. Hoadley and M. C. Linn. Teaching Science
                                                                          Through Online, Peer Discussions: SpeakEasy. In
9     ACKNOWLEDGMENTS                                                     The Knowledge January 2005.
We are grateful to the reviewers for their helpful feedback.       [11]   J. Kupiec, J. Pedersen, and F. Chen. A Trainable
This research was supported by the Advanced Knowledge                     Document Summarizer. In Proceedings of the ACM
                                                                          SIGIR’95 Conference, pages 68–73. ACM, 1995.
                                                                   [12]   S. Lawrence, C. L. Giles, and K. Bollacker. Digital
3 http://kmi.open.ac.uk/projects/hyperdiscourse
                                                                          Libraries and Autonomous Citation Indexing. IEEE
4 http://www.pragmaticweb.info                                            Computer, 32(6): 67–71, 1999.
5 http://www.laboranova.com
[13]   C. Mancini and S. Buckingham Shum. Modelling                  ACM.
       Discourse in Contested Domains: a Semiotic and         [20]   F. M. Shipman and R. McCall. Supporting
       Cognitive Framework. International Journal of                 Knowledge Base Evolution with Incremental
       Human Computer Studies, 64(11): 1154–1171, 2006.              Formalization. In Proceedings of the SIGCHI
[14]   C. C. Marshall. Annotation: from Paper Books to the           Conference on Human Factors in Computing
       Digital Library. In Proceedings of the 2nd ACM                Systems, pages 285–291. ACM, April 1994.
       International Conference on Digital Libraries, pages   [21]   J. M. Swales. Genre Analysis: English in Academic
       131–140, Philadelphia, PA, USA, 1997. ACM.                    and Research Settings. Cambridge University Press,
[15]   H. Nanba and M. Okumura. Towards Multi-Paper                  1990.
       Summarization using Reference Information. In          [22]   S. Teufel and M. Moens. Summarizing Scientific
       Proceedings of the IJCAI’99 Conference, pages 926–            Articles: Experiments with Relevance and Rhetorical
       931, 1999.                                                    Status. Computational Linguistics, 28(4):409–445,
[16]   I. Ovsiannikov, M. A. Arbib, and T. H. McNeill.               December 2002.
       Annotation Technology. International Journal of        [23]   V. Uren, S. Buckingham Shum, G. Li, and M.
       Human Computer Studies, 50(4): 329–362, 1999.                 Bachler. Sensemaking Tools for Understanding
[17]   G. Salton, A. Singhal, C. Buckley, and M. Mitra.              Research Literatures: Design, Implementation and
       Automatic Text Decomposition Using Text Segments              User Evaluation. International Journal of Human
       and Text Themes. In UK Conference on Hypertext,               Computer Studies, 64(5):420–445, 2006.
       pages 53–65, 1996.                                     [24]   V. Uren, Buckingham Shum, S., Li, G., Domingue, J.
[18]   B. Sereno.       A Document-Centric Semantic                  and Motta, E. Scholarly Publishing and Argument in
       Annotation Environment to Support Sense-Making.               Hyperspace. Proc. WWW2003: 12th Int. World Wide
       PhD thesis (also available as Technical Report KMI-           Web Conference, May 20-24, 2003, Budapest.
       06-13), Knowledge Media Institute, The Open            [25]   K. Weick. Sensemaking in Organizations. 1995,
       University, Milton Keynes, UK, September 2005.                Thousand Oaks, CA: Sage Publications.
[19]   B. Sereno, S. Buckingham Shum, and E. Motta.           [26]   M. Weinstock. Citation Indexes. In Encyclopedia of
       ClaimSpotter:    an    Environment     to   Support           Library and Information Science, volume 5, pages
       Sensemaking with Knowledge Triples. Proc. Int.                16–40. 1971.
       Conf. Intelligent User Interfaces, pages 199–206,