Annotopia: An Open Source Universal Annotation Server for Biomedical Research Paolo Ciccarese and Tim Clark Massachusetts General Hospital and Harvard Medical School, Boston MA paolo.ciccarese@gmail.com; tim_clark@harvard.edu Abstract Annotopia is an open source, open services platform for creating, managing, manipulating and sharing open annotation using the W3C Open Annotation Data Model. It can create and/or manage annotation of HTML, PDF, and other resources including data and ontology concepts, with text, semantic tags, and other annotation types. It supports fine-grained permissions on annotations. Annotopia is a Swiss-army knife for W3C Open Annotation system developers, eliminating many otherwise challenging backend development tasks. Keywords: annotation, biomedical, entity recognition, semantic web 1 Background Annotation of documents and databases on the Web is a core aspect of the Web’s interactivity [1], but until recently, annotations have been second-class objects, tied permanently to the applications and servers that host them [2]. This is a significant missing feature for the scientific and biomedical community, which increasingly re- lies on the Web as its primary means of knowledge dissemination and group interac- tion. As a result of this feature gap, comments, discussions, semantic tags, references, and other annotations on biomedical publications, are atomized across disparate ser- vers and media type based representations. Our goal is to fill this gap, making anno- tations first-class, independently-managed objects on the web. Projects from distributed hypermedia research programs in the 1980’s, upon which many aspects of the early Web were based, actually had several of these properties [3- 5]. Berners-Lee’s inspired stripping down of these models into “the simplest thing that could possibly work” [6], laid the basis for a transformative global, collaborative development of the Web and its technologies [7], but necessarily removed such fea- tures. In the early 2000’s, W3C’s Annotea project began to attempt restoration of some of the lost features based on the modern web architecture [8]. Annotea was a founda- tion for later annotation models and systems focused on biomedical annotation [9, 10]. Similar models were developed for digital humanities use cases. These specifica- tions were merged to develop a more diverse, community-based specification, the W3C Open Annotation Data Model [11], now on standards track in the W3C. While various annotation tools are now available and in use, current annotation platforms use different representation formats. Such tools normally provide little or no means to export the annotation in a usable or reusable fashion. The Open Annota- tion model is directed at solving the format interoperability issue, but interoperability on a large scale has not been showcased yet. It requires the existence of special too- ling to handle storage and distributed integration of the Open Annotation Data Model (OADM) format annotation, and our analysis indicated that this is a server-side issue. Furthermore, updating existing software is never an easy task and creating new software with an Open Annotation backbone requires significant knowledge of the OADM specifications and introduces software constraints. It appears that most anno- tation efforts focus a lot of their energy on the front-end or client as user interaction is key for adoption and the technical difficulties to work with different operative sys- tems and browsers are not trivial. Moreover, the annotation projects of which we are aware, all rely on different back-end software. We argue that developing many different back-ends, which per- form very similar operations, results in higher community costs and in a slower penet- ration of the Open Annotation specification. OADM services, if not based on a com- mon service model, will need to be implemented in several pieces of software before having different systems communicating and exchanging content. Lastly, we also have noted that the existing annotation back-ends implement very similar features that could be coded once and easily serve multiple clients. Within the scope of an extensible architecture, services like: (i) Open Annotation compliant sto- rage, (ii) text mining, (iii) entity recognition, (iv) image analysis and (v) Linked Data mashups could be implemented once as common services. This approach could re- duce the development time and the cost of future annotation platform whose develo- pers will be able to focus on new features and components without the necessity of reinventing the common functionality. 2 Methods Our group developed, and will demonstrate in this workshop, Annotopia: the first W3C OADM-compliant, biomedically-oriented Open Annotation server, in response to these challenges. Annotopia is a joint project of researchers at the Massachusetts General Hospital and Eli Lilly & Company. It is an open-source product (https://github.com/Annotopia). Annotopia operates as a Swiss Army Knife for anno- tation. It facilitates creation of interoperable annotation platforms, by providing an extensible back-end solution supporting the open W3C standard. Thus it allows deve- lopers to focus effort on client software, reducing development time and resources. Annotopia is constructed so that every Annotopia instance can support integration with (i) multiple annotation clients (ii) other Annotopia servers (iii) other Open Anno- tation compliant servers (iv) other non Open Annotation compliant servers (v) exis- ting text mining services (vi) pre-computed text mining results (vii) ontology ma- nagement platforms and custom databases for generating structured annotation (viii) Linked Data SPARQL end-points and much more. Annotopia incorporates features and ideas from two other annotation servers we developed in the recent period: CATCH and Domeo. We have extensively described Domeo elsewhere [10]. CATCH supports HarvardX Massively Open Online Courses (MOOCs), with textual and video annotation, for classes as large as > 20,000 stu- dents. While CATCH and Domeo focus on annotation of video, images and textual documents (HTML and PDF), Annotopia allows in addition, annotation of data, or of anything that is uniquely identifiable, even concepts in ontologies. Annotopia has been integrated and tested against the Annotator.js client, the Do- meo Web annotation toolkit, and the Utopia PDF viewer [12]. 3 Annotopia architecture and technologies Annotopia consists of a modular architecture providing a series of extension points. Extension points are necessary for handling custom structured annotation types as well as an always-increasing amount of external services to be integrated with the platform through appropriate connectors. The core platform is written in Java/Groovy [13] making use of the Grails [14] web application framework. The Grails plugin infrastructure has been extensively exploited for realizing the modular approach. A high level view of the architecture is shown in Figure 1. Fig. 1. - The high-level Annotopia architecture. Each block corresponds approximately to a software plugin or module. 4 Demonstration Our demonstration will showcase the annotation storage, search, and textmining in- tegration capabilities of Annotopia. We will also demonstrate interoperability between multiple HTML and PDF article representations. We expect in future to be able to demonstrate direct database annotation as well. 5 References 1. O'Reilly T: What Is Web 2.0: Design Patterns and Business Models for the Next Generati- on of Software. In.: O'Reilly Network; 2005 [http://www.oreillynet.com/lpt/a/6228]. 2. Ciccarese P, Soiland-Reyes S, Clark T: Web Annotation as a First-Class Object. IEEE In- ternet Computing 2013, Nov/Dec 2013:71.75 3. Bechhofer S, Goble C: COHSE: Conceptual Open Hypermedia Service. In: Annotation for the Semantic Web. Edited by Handschuh S, Staab S. Amsterdam: IOS Press; 2003. 4. Carr L, De Roure D, Hall W, Hill G: The Distributed Link Service: A Tool for Publishers, Authors and Readers. In: Fourth International World Wide Web Conference: December 11-14, 1995; Boston, Massachusetts, USA. World Wide Web Consortium (W3C) 1995: 5. De Roure D, Carr L, Hall W, Hill G: Enhancing the Distributed Link Service for multime- dia and collaboration. In: Distributed Computing Systems, 1997, Proceedings of the Sixth IEEE Computer Society Workshop on Future Trends of: 29-31 Oct 1997 1997. 330-335 6. Venners B: The Simplest Thing that Could Possibly Work: a Conversation with Ward Cunningham, Part V. Artima Developer 2004 [http://www.artima.com/intv/simplest.html]. 7. Jacobs I, Walsh N: Architecture of the World Wide Web, Volume One. In: W3C Recom- mendation World Wide Web Consortium; 2004 [http://www.w3.org/TR/2004/REC- webarch-20041215/]. 8. Kahan J, Koivunen M-R, Prud'Hommeaux E, Swick RR: Annotea: An Open RDF Infra- structure for Shared Web Annotations. In: WWW10 International Conference: May 2001 2001; Hong Kong. World Wide Web Consortium: [http://www10.org/cdrom/papers/488/index.html]. 9. Ciccarese P, Ocana M, Castro L, Das S, Clark T: An open annotation ontology for science on web 3.0. J Biomed Semantics 2011, 2(Suppl 2):S4 10. Ciccarese P, Ocana M, Clark T: Open Semantic Annotation of Scientific Publications with DOMEO. Journal of Biomedical Semantics 2012, 3(Suppl 1):S1 [http://www.jbiomedsem.com/content/3/S1/S1]. 11. Sanderson R, Ciccarese P, Sompel HVd, Bradshaw S, Brickley D, Castro LJG, Clark T, Cole T, Desenne P, Gerber A, Isaac A, Jett J, Habing T, Haslhofer B, Hellmann S, Hunter J, Leeds R, Magliozzi A, Morris B, Morris P, Ossenbruggen Jv, Soiland-Reyes S, Smith J, Whaley D: W3C Open Annotation Data Model, Community Draft, 08 February 2013. W3C 2013 [http://www.openannotation.org/spec/core/]. 12. Attwood TK, Kell DB, McDermott P, Marsh J, Pettifer SR, Thorne D: Utopia documents: linking scholarly literature with research data. Bioinformatics 2010, 26(18):i568-i574 [http://bioinformatics.oxfordjournals.org/content/26/18/i568.abstract]. 13. Henry K: A crash overview of groovy. Crossroads 2006, 12(3) 14. Rocher G, Brown J: The Definitive Guide to GRAILS. Berkeley CA: Apress; 2009.