Annotations in the wild 1 2 Laurent Denoue and Laurence Vignollet Abstract. We believe that storing web annotations on to highlight a part of a document without adding a annotation servers limits the widespread adoption of web comment. annotation technology. Instead of relying on annotation servers, we propose to encode annotations as an extended We propose to use the existing URL format to encode URL. Because they follow the standard URL encoding, an annotation, and call it an extended URL. In the current these extended URLs can be readily used and embedded in URL format, it is possible to point to a specific part of a Web documents. We then envision specific search engines document by specifying a tag name after the # sign in the that would be able to index these extended URLs and URL. These tag names are extracted by the Web browser provide interesting new services. and used to locate the similar name in the current Web page. We propose to use the same mechanism to allow anybody point to a part of a document. 1 INTRODUCTION In our scheme, the simplest extended URL looks like: On the Web, you can create a link using a standardized format (i.e. URL) and embed it into your documents http://www.cnn.com#anchor= without relying on an external server. You can publish your links when you wish so by simply making the document where http://www.cnn.com is the document being annotated publicly available. Others can then leverage the links you and is a string allowing the browser to find a have created, like building better web search engines (e.g. specific part in this document. We will describe the Google). part later. Note that this scheme is compatible with the popular GET method where parameters are passed But what would be the Web if every link you created after the ? sign. The # sign is always appended after ?: had to be stored on a link server? What would it be if everyone used a different scheme to encode a link? What http://www.cnn.com?id=12032002#anchor= would it be if you could not embed a link into your documents? Sadly, today’s annotation systems are The description of the annotation can also be included in implemented with this model: the extended URL. For example: x Users rely on a remote annotation server to create the annotation, http://www.cnn.com#anchor=&description=my x Once created, the annotation is not given a unique %20comment URL: users cannot easily link to it from their documents, will annotate the part of the document http://www.cnn.com x Every annotation system uses its own format and set of identified by with a description containing “my APIs to retrieve annotations, making it hard for third comment”. The description can also include a URL, party applications to index and reuse these annotations. allowing users to annotate a document with any object identifiable with a URL (e.g. picture, video, program). In this paper, we argue for the same bottom-up approach that made the web so popular: the possibility for anyone to link to any document from any document: 3 ENCODING PARTS OF A DOCUMENT x Users don’t need a remote service to create hyperlinks: IN THE EXTENDED URL hyperlinks are self-contained, x Hyperlinks can be embedded into documents, After having identified a specific resource with a URL, the x Hyperlinks are standardized. anchor also needs to identify a part of the document pointed to by the URL. Because the documents being annotated can Here, we propose to extend hyperlinks so that they change, the encoding should allow a user or an application encode the URL of the annotated page and the annotation to: made on this page. x Detect that the anchor point is not valid anymore x Attach the annotation on the new version of the document 2 SELF-CONTAINED ANNOTATIONS USING EXTENDED URLS XPointers [1] have been proposed to link to sub-parts of a document. But because they mainly rely on the structure An annotation can be described as a couple (anchor, of the document, we think that XPointers have two description). The anchor encodes the attachment point of limitations: the annotation (e.g. the whole document or part of a x It is hard to attach the annotation when the structure of document); the description is what the users attaches to this the document changes anchor (e.g. a textual comment). The description is x The content pointed to by the XPointer is not human- optional. For example, the anchor is sufficient if users want readable 1 FXPAL, Palo Alto, CA USA Following research from David Bargeron [2] and Ping 2 Lee [3], we suggest that the encoding uses a more “human- Syscom, University of Savoie, France level” encoding. For example, the encoding of a highlight in a document could be the simple string being highlighted. limitation is privacy: annotating a document is a personal Because the string is not always unique, the encoding can activity, and not all users would be willing to share all also keep the rank of the string in the whole document. annotations they create. Unfortunately, most annotation Thomas Phelps and Robert Wilensky at UC Berkeley have systems designed so far promoted the sharing of specifically designed robust encoding strategies using the annotations, forgetting why people annotate in the first content of a document [4]. Although we envision a “human- place. level” encoding for anchoring annotations, we also acknowledge that some applications will need to Bookmarking is a particular form of annotation. People specifically anchor annotations to the structure of the accumulate personal bookmarks over time. After a while, document. Xpointer-like encodings can then be used. they might consider a subpart worth sharing with others and decide to publish a Web page containing their best bookmarks. 4 IMPLEMENTING EXTENDED URLS We hope that designers of annotation systems will We have extended Yawas [5] to let users create extended URLs. Currently, Yawas only supports text highlighting. externalize their annotations in a form of extended URLs. Automatic ‘agents’ could then crawl the web for retrieving From a Web browser, the user selects a textual content and these extended URLs and start building an annotation chooses “Highlight” from a context-menu. Yawas creates an extended URL for this highlight and automatically search engine. Many services could then be implemented. copies it into the system’s clipboard. This approach allows For example, we showed in a previous work how to improve Web page classification by using annotations users to quickly embed their annotations into emails or (versus the full text) [5]. It would be very interesting to other documents. Users can also create many annotations on one document and export all of them. The encoding is extend this research by using annotations created by many quite simple for our prototype. From the highlighted text, users. More generally, annotations give us an understanding we keep the 10 and last 10 characters, followed by the of not only what document users like, but what in the original size of the highlighted text, followed by the rank of document they like. this string in the whole document. Here, we proposed a simple solution for encoding the Users having Yawas installed on their machine can anchor point of an annotation based on the content as access these annotations. Because there is no search engine opposed to the structure. Obviously, different techniques can be used to perform this encoding. Instead of having to available for retrieving annotations attached to a web page, choose one over the other, one solution would be to publish users currently list URLs of documents where Yawas should look in for annotations. Users can also receive an the decoding algorithm for each one so the research annotation by email in the form of an extended URL. When community can investigate different encoding schemes without preventing annotation technology to be deployed. they click on the link, the web browser opens the page. Current web browsers silently ignore the parameters after the # sign. In the current implementation of Yawas, users ACKNOWLEDGMENTS choose “Import” from the context-menu in Internet Explorer and paste the extended URL. Yawas then parses We would like to thank all members of the w3c-annotation the string and highlights corresponding parts of the list who gave us interesting ideas and feedback on the document. We are working on a new version where Yawas- design of annotation systems. enabled browsers will be able to understand these extra parameters and highlight the corresponding passage in the document without the user having to manually paste the REFERENCES extended URL. [1] World Wide Web Consortium, ‘XML Pointer Language (XPointer) Version 1.0’ (2001) Users can also create more than one annotation on a web [2] D. Bargeron, ‘Robust Annotation Positioning in Digital page A, store them in a web page B and send a link to page Documents’, Proceedings of CHI 2001, 285-292 (2001) B by email. The recipient of the email just clicks on the link [3] L.P. Lee, ‘CritLink: Better Hyperlinks for the WWW’, and the browser loads page B. When page B is loaded, Proceedings of Hypertext 98 (1998). Yawas checks for all extended URLs embedded in this page [4] T. Phelps and R. Wilensky, ‘Robust Hyperlinks Cost and stores them in a local annotation file. Using the Just Five Words Each’, UC Berkeley Computer Science REFRESH meta tag in HTML, page B can automatically Technical Report UCB//CSD-00-1091, Berkeley,USA redirect the recipient to page A that was originally (2000). annotated. Because Yawas already parsed the embedded [5] L. Denoue and L. Vignollet, ‘Personal Information URLs in page B and extracted its extended URLs, Organization using Web Annotations’, Proceedings of highlights are then automatically displayed in page A. Webnet 2001, Orlando, Florida (2001). 5 CONCLUSION AND FUTURE WORK Our implementation tries to demonstrate the flexibility of using extended URLs to embed annotations in any document supporting hyperlinks (e.g. HTML and Word documents). Although the web encourages sharing, we should keep in mind that annotations should not only be sharable using a web server. We believe that forcing users to do so limits them from adopting web annotation technologies. A typical