=Paper= {{Paper |id=Vol-273/paper-4 |storemode=property |title=Adding Value to Biodiversity Images through Community Annotation |pdfUrl=https://ceur-ws.org/Vol-273/paper_53.pdf |volume=Vol-273 |dblpUrl=https://dblp.org/rec/conf/www/Riccardi07 }} ==Adding Value to Biodiversity Images through Community Annotation== https://ceur-ws.org/Vol-273/paper_53.pdf
      ADDING VALUE TO BIODIVERSITY IMAGES THROUGH
                COMMUNITY ANNOTATION
             Greg Riccardi                      Andrew Deans, David Gaitros, Katja Seltman, Steven Winner
           College of Information                 Neelima Jammingumpula,         School of Computational Science
          Florida State University                Corinne Jorgensen, Peter           College of Information
     Tallahassee, FL 32306-2100 USA
             01-850-644-2869
                                                   Jorgensen, Austin Mast,     and Department of Biological Science
                                                 Karolina Maneva-Jakimoska,          Florida State University
         Riccardi@ci.fsu.edu                    Debbie Paul, Fredrik Ronquist,
ABSTRACT                                                                Discovering and recording ad-hoc data is the most problematic. It
Morphbank, an on-line collection of museum-quality biological           is particularly difficult to find ways that users can record
images, is an NSF funded project designed to facilitate the on-line     associations among objects.
collaboration of biologists from around the world [3]. Our primary      As long as data is well formatted and constrained to the database
focus is to aid in the collection and management of images that         schema then finding and retrieving it is simple. However, as
are useful in phylogenetic research. Morphbank users are actively       we’ve discovered, there is no practical limit to the amount of
collaborating on the creation of information that represents the        information a scientist may wish to store with a particular
associations among images and related biodiversity data objects.        specimen. Most of the knowledge is contained in the memory of
This paper describes the Morphbank annotation tool and data             these scientists or in hand written notebooks. Although it is
models and gives examples of how users create structured                recognized that manual annotation is expensive and time
information in the system. Schematized annotation provides              consuming it is nevertheless still essential in documenting
biologists with a flexible framework to create semantically-rich        collaborative knowledge in biological systems [2]. Translating
annotations using their own data models.                                and storing this knowledge in a searchable form is the challenge.
Keywords                                                                2. BACKGROUND
Annotation, association, biodiversity                                   Morphbank is an open Web repository of images serving the
                                                                        biological research community. It is currently being used to
1. INTRODUCTION                                                         document specimens in natural history collections, to voucher
The discovery, identification, and documentation of biological          DNA sequence data, and to share research results in disciplines
entities are time consuming and tedious tasks. The subtle               such as taxonomy, morphometrics, comparative anatomy, and
differences between similar species may be so minute as to              phylogenetics. Morphbank can serve as a virtual reference
require the collaboration of several experts to identify. Each          collection of named organisms or a resource for comparative
taxonomic group has many experts who can assist in the                  morphological study; new use cases are continuously added [7].
identification of specific organisms. However, with the increase in     Each image in the database is associated with fully searchable set
the number of new organisms that have been discovered and a             of text information. Additionally images can be downloaded in
decrease in number of senior specialists, identification and            several different formats [3]. Understanding the background of
curation of data have become more difficult. Often, it involved the     Morphbank is important to understanding the complexity of the
need for scientists to travel to the location of the specimens or for   problem of collaborating with other scientists on the identification
specimens to be sent to the scientists for first hand examination.      and curation of biodiversity data.
This is still standard practice among most biologists today.
Morphbank contains information about organisms. Each image in           2.1 MORPHBANK OBJECTS
the system is associated with one or more specimens. Each               Each object in the Morphbank system is uniquely identified and
specimen is a representation of information about an organism.          includes a set of standard fields that assist us in cataloging the
Specimens are in turn associated with localities, contributors,         location and type of each object, the identification of the user who
taxonomic concepts, and a variety of annotations.                       added the object, the date and time of creation, an optional
                                                                        description of the object, and the last time the object was
The design and development of the Morphbank system identified           modified. These attributes allow anyone accessing Morphbank
several challenges in discovering and creating information about        sufficient information to find and catalog data and associate
images and their related objects.                                       related objects. Each object is externally identified by a Life
                                                                        Science Identifier (LSID) [13].
 Finding images and specimens associated with a specific
 species and genus,                                                     2.2 MORPHBANK OBJECT
 Finding and recording information about that image and its            RELATIONSHIPS
 related objects, and                                                   Since each Morphbank object is uniquely identified, any object
 The discovery and recording of ad-hoc associations among              can be the target of a stored reference. A single column within a
 the various objects.                                                   Morphbank table holding a foreign key may refer to several an
                                                                        object of any type. Thus a collection object can be heterogeneous.
                                                                        For instance, an annotation object may define an association
 1                                                                      among images, specimens, locations, users, or even other
  Supported by NSF contract DBI-0446224, 2005-2008
 WWW 2007, May 8--12, 2007, Banff, Canada.                              annotations.
This flexibility allows for the creation of complex collections of   made digital annotations somewhat cumbersome. The increased
objects that can be shared with other users of the Morphbank         use of Javascript, higher speed communications, improved Web
system. Although there are a series of predefined relationships in   interface standards, and increased browser capability have made
Morphbank, the use of unique identifiers allows users to define an   Web-based digital annotations more of a reality. However, there is
unrestricted set of complex relationships of objects within the      still no convenient method for making annotations on the sides of
confines of the system.                                              Web pages as you would on paper documents [8].
Figure 1 shows the result of searching for images that are related   The problem of biodiversity annotation is that biologists have
to the taxon with id 30244, the species asclepias amplexicaulus.     increased the number of specimens they can gather but have not
The search looks through the known associations between objects      increased their ability to catalog, identify, and study them.
to find the proper set. Each image in the set is associated with a   Collaborations still include the exchange of physical specimens
specimen which is associated with the proper taxon. The structure    and the manual annotations of the images using indexed cards and
of these predefined associations allow the search to be both         paper documents. At the functional level, many users have
effective and efficient. The information about the images in         developed their own specific but proprietary solution to this
Figure 1 comes from the image, its related specimen and its          problem. Through the use of Morphbank and a Web based
related taxon.                                                       annotation tool, we can solve most if not all of these problems.

                                                                     3.1 MORPHBANK OBJECT ANNOTATION
                                                                     A variety of annotation technologies allow users to add value to
                                                                     images by creating associations between those images, text and
                                                                     other digital objects. Morphbank takes this one step further by
                                                                     making the associations into first class objects that can themselves
                                                                     be annotated and associated with other objects. Morphbank also
                                                                     allows associations to take on specific semantic characteristics
                                                                     that constrain their meaning and thereby improve searching and
                                                                     understanding.
                                                                     Image annotation is available in a variety of image management
                                                                     Web sites. The simplest annotations are found in systems that
                                                                     support attaching tags to images and other media. Flickr.com and
                                                                     YouTube.com, e.g., allow users to add text attributes (tags) to
                                                                     images and use those tags to support searching. FotoTagger.com,
                                                                     among others, goes a step further and allows the tags to be
                                                                     attached to specific locations on images.
                                                                     Blogging is another form of image annotation in which text
 Figure 1. The result of searching for images for a particular       passages are linked to images, Web pages and other digital
                             taxon                                   objects. A blog entry creates an associate between its own text and
                                                                     the linked objects.
3. BIOLOGICAL ANNOTATION
                                                                     Annotea.org supports the creation of RDF attributes for image
REQUIREMENTS                                                         tags. These attributes can be used to provide search inference
The users of the Morphbank database system have identified
                                                                     capabilities for users of image repositories.
several requirements for image and object annotation to be used
by authorized users of the system. These requirements are            Another annotation strategy involves the development of
consistent with the Specifications For Image Annotation On The       laboratory notebooks such as those under development at the
Semantic Web as described W3C in their draft document [5]. A         United States Department of Energy, National Collaboratories
major restriction placed on Morphbank development was that the       under the guidance of Dr. Jim Myers [11]. These middle-ware
annotation software must be accessible through the use of a Web      products present researchers, applications, problem-solving
browser without the need to download an extensive set of client      environments (PSE), and software agents with a layered set of
based applications. This requirement was established because         application services that provide a finite set of capabilities for the
research biologists frequently travel from one location to another   creation and management of meta-data, the definition of semantic
and many times only have access to a Web browser. Additionally,      relationships between data objects, and the development of
annotations must be made in real-time and directly to the actual     electronic research records [10]. Users are able to record
data source to avoid update anomalies associated with multiple       associations between digital objects across and among projects.
copies of the data. Updates and annotations made by one scientist
must be readily available to other colleges for collaboration in a   Morphbank seeks to combine these ideas by allowing
timely manner.                                                       incorporating an extensible annotation type system and by
                                                                     systematically expanding the scope of associations by including
There has been considerable effort put into the development of       any objects referenced by globally unique IDs (GUID).
general purpose Web-based annotation tool sets over the past
several years. In their paper on Web annotations, Venu               Morphbank was designed to allow users to take advantage of Web
Vasudevan and Mark Palmer [15] described an approach 6 years         service products to gain access to the data by conforming to
ago on the development of a Web based annotation tool that could     industry practices and standards but maintain the ontology of the
be used to annotate documents over the Internet with just the use    original data. Users will browse or search the Web site for
of a Web browser. However, they discovered several limitations       Morphbank objects using a variety of tools provided through the
in the use of Web browsers and of HTML as layout languages that      Web site.
3.2 BASIC ANNOTATION TEMPLATE                                            with each other. User will select any two Morphbank objects
An annotation is an assertion that a collection of objects are           (image, specimen, view, location, publication, user, group, etc)
related in a particular way. For annotation and search purposes,         and then describe the relationship among the two.
the Morphbank object annotation tool provides a minimum set of         4. EXAMPLES OF ANNOTATIONS
tools common to all annotation requirements. The tool uses the         Specimen image annotation captures people’s knowledge of
terminology of the Darwin Core [1] biodiversity ontology               species such as new observations, and disagreements with
initiative. We strove to keep the tool-set as simple and as straight   previous annotations. Image annotation enables semantic image
forward as possible and to provide specializations that make it        retrieval and maintains a record of user comments concerning the
easy for particular types of annotations to be created.                data. Furthermore, a collection of featured annotations provides a
Flexibility is particularly important because all annotations must     way to assign species to a specimen. Image annotation associates
be made using only a Web browser. The template for the tool            textual information to the specific region of an image to enable
defines several functional areas required for basic biodiversity       semantic querying.
annotation and specimen determination.                                 Two technologies are frequently used: Text-based approach and
                                                                       field-based approach. The former simply add keywords to the
3.3 TYPES OF ANNOTATIONS                                               whole image using natural language. However, keyword-based
Using the ability to store complex metadata with annotations
                                                                       retrieval returns irrelevant documents (i.e., low accuracy of
gives allows us to define associative semantic relationships with
                                                                       retrieval). A field-based method describes and retrieves an item
ad-hoc data and other Morphbank data. The data model that
                                                                       using one or more field-value pairs, thus improves the retrieval
supports annotation is intended to be extended to incorporate
                                                                       precision. Figure 2 shows an image annotation of the field-based
additional types as needed by users. The categories of annotations
                                                                       approach. This annotations asserts that a particular portion of an
in the current system are as follows:
                                                                       image (of a wasp leg) is a femur.
 General: There are instances where users desire to make
 some ad-hoc comments concerning a collection of images,
 specimens or other objects. The requirement for this type of
 annotation was made to allow maximum flexibility for
 including comments, measurements, and other related data to be
 stored and associated with the collection of objects. A very
 useful example of a general annotation is a simple collection of
 objects, much like a shopping cart, that can be stored,
 organized, and labeled for later use.
 Image: As a phylogenetic database, images are vitally
 important to the users of the system. Therefore, many of the
 annotation types described in this section will apply specifically
 to images. The types of image annotations are listed as:
   Spot location on an image associated with the annotation.
   The user will identify a specific spot on the image to associate
   with a label, title, and paragraph description.
   Circle associated with an area on the image.a The user will
   place a circle encapsulating an area to associate with a label,                 Figure 2. An Image Annotation Example
   title, and paragraph description.
   Rectangle associated with an area on image. The user will          However, both text-based and field-based approaches store the
   place a rectangle encapsulating an area to associate with a         information in a plain text format. It is known that querying the
   label, title, and paragraph description.                            plain text is inefficient. Furthermore, storing annotation
 Taxon Determination: Used for discussion concerning the              information using only plain text is not suitable to satisfy the
 species or other taxonomic determination of a specimen. Users         higher level requirements for the system. Meaning and ontology
 will select a specimen and by using the associated images, make       must be associated with the data. The heterogeneous data models
 a recommendation as to the specific genus and species                 from different biologists and the diversity of association types
 determination. Taxon determinations are extremely important to        require frequent update and evolving data structures.
 the research activities of the primary users.
                                                                       Figure 3 shows a Morphbank image annotation in context. The
 Phylogenetic Character and State: This type of annotation            annotation contains attribution (upper left), a small instance of the
 will be used to organize physical features (called ―characters‖)      annotated image (upper right), detailed comments, with technical
 of organisms into objects of interest to research users.              terms highlighted (lower left), and brief descriptions of other
 Phylogenetic characters and possible values (states) of those         annotations of the same image (lower right).
 characters are associated with specific images, with species, and
 with collections of species. In this type of annotation, the user     The annotation of Fig. 3 asserts that the wasp whose leg is shown
 will associate an image or specimen in the database with              has a particular feature, which is called ―femur swollen medially‖.
 phylogenetic characters and states.                                   Such features are used by experts to categorize specimens into
                                                                       taxonomic units (genus, species, etc.) and, after analysis, to
 Relationship: Morphbank comes standard with predefined               develop evolutionary models.
 data relationships. Relationship annotations allow the user to
 define additional relationships associating Morphbank objects
Morphbank is using annotation and association technology to
collect information that is directly used in scientific research.
Each of the Morphbank objects related to the annotation of Figure
3—the image, the annotations, the related specimen, etc.—are
represented as first-class objects with globally-unique identity.
Thus the objects can be stored in collections, included in other
annotations, and referenced in external sites.




                                                                         Figure 5. Morphbank display of the image of a herbarium
                                                                                                 sheet
                                                                       Creating the determination annotation sheet began with interviews
                                                                       with domain experts and the evaluation of typical manual records.
                                                                       Figure 6 shows a detail of the herbarium sheet of Figure 5 that
                                                                       contains the information cards that are attached to the sheet. Two
                                                                       cards are attached. The lower card is the primary information
                                                                       about the specimen including who collected it, when and where.
            Figure 3. Image Annotation In Context                      The lower card also shows the species determination that was
Mass annotations are possible as well. Figure 4 shows an interface     recorded when the specimen was collected.
that allows a user to annotate each of a group of objects. In this
case, the user is preparing to comment on the species
identification, also called the determination of several botanical
specimens. This annotation interface has been developed to enable
a specific activity to be performed by experts on plant
morphology.




                  Figure 4. Group Annotations                                Figure 6. Information card from herbarium sheet
                                                                       The upper card shows a determination annotation that was added
5. PRELIMINARY RESULTS                                                 to the specimen in 1983. J. Farmer of the University of North
The Morphbank research team has been working closely with a            Carolina agreed that the determination was correct.
group of botanists at the Department of Biological Sciences at
Florida State University to use the annotation tool for the curation   In pencil, between the two cards is second annotation. D. D. Ward
of specimens from the Robert K. Godfrey Herbarium at Florida           in 1983 also agreed on the correctness of the determination.
State University. Figure 5 shows some of the Morphbank
information for a typical herbarium sheet.                             The Morphbank annotation tool is intended to allow the online
                                                                       collection and dissemination of information like that shown in
Fig. 6. The tool will allow researchers to evaluate the                       integration. In Workshop on Knowledge Markup and
determination of the specimen, that is, the association between               Semantic Annotation, KCAP03, 2003.
each specimen and its taxon. The activity is an evaluation of the
                                                                         [3] D. Gaitros, G. Riccardi, F. Ronquist, N. Jammigumpula, and
quality of the information stored in the herbarium.
                                                                              W. Blanco. Morphbank, the development of a general
A major benefit of the Web tools is its support for distributed               purpose bioiinformatics database. Conference on Internet
collaboration. Before the sheets were                                         Computing (ICOMP’05), pages 31–37, Jun 2005.
The annotation interface shown in Fig. 4 can be used to agree with       [4] L. Haas, D. Kossmann, E. Wimmers, and J. Yang. An
the recorded determination of the set of specimens, or to disagree            optimizer for heterogeneious systems with non-standared
and select a different taxon. In this way the annotation represents           data search capabilities. in special issue on query processing
a qualitative evaluation of the recorded information. Fig 4 shows             for non-standard data. IEEE Data Engineering Bulletin 19(4),
that 19 annotations already record agreement (A) with the                     pages 37–43, Dec 1996.
determination.                                                           [5] C Halasheck-Weiner, J Hunter, N Simou, J Smith, and V
The results so far are very promising. Fifteen taxonomists were               Tzouvaras. Image annotation on the semantic Web, Jan 2006.
asked to use Morphbank images of specimens from the Robert K.            [6] P. Korica, H. Maurer, and N. Scerbakov. Extending
Godfrey Herbarium at Florida State University to make digital                 annotations to make the truly valuable. World Conference on
determination annotations for 50 specimens each. The scientists               E-Learning in Corporate, Government, Healthcare, and
found the online tools to be an excellent replacement for the                 Higher Education (ELEAN) 2005, 2005.
manual task. They were particularly pleased to be able to see the
results online and to be able to see the effects of this online          [7] J Liljeblad and F Ronquist. A phyogenetic analysis of higher-
collaboration.                                                                level gall wasp relationships (hymenoptera: Cynipidie).
                                                                              Systemantic Entomology, 23:229–252, 1998.
An additional study of the feasibility of making determinations
from images in lieu of physical specimens was conducted by
                                                                         [8] P. Marshall. Annotations: From paper books to the digital
                                                                              library. in Proceedings of the ACM Digital Libraries 97
bringing some of these experts to Florida. The study is ongoing.
                                                                              Conference, Philidelphia, Pa, Jul 1997.
We hope to be able to establish that digital representations of
these specimens are more than adequate replacements for the real         [9] C Meng. Biological information standards. Bulletin of the
objects.                                                                      American Society for Information Science and Technology,
                                                                              2004.
6. CONCLUSION
We have described an existing need in the biological community           [10] J Myers. http://collaboratory.emsl.pnl.gov/, 2004.
to store and retrieve complex information on specimen and related        [11] J Myers, A Chappell,MElder, A Geist, and Schwidder J.
images. In creating a Web site that stores the elements common to             Reintegrating the research record. IEEE Computing and
all entities in the Tree of Life, we have made biodiversity research          Science and Engineering, May 2003.
more effective.
                                                                         [12] MySQL. http://dev.mysql.com/techresources/ articles/mysql-
Our work in developing a tool that allows users to annotate                   5.1-xml.html.
images via the Web using only the essential elements has proven          [13] D. Smith S. Martin and B. Szekely. Lsid(life science
successful. The non-intrusive method permits biologists to mark               identifer) project, 2005. http://lsid.sourceforge.net.
images without altering the original image, and share this
annotations with others in an easy and open format. Our hope is          [14] P Spyns, R Meersman, and M Jarrar. Data modeling versus
that the work performed under this NSF grant by the Morphbank                 ontology engineering. SIGMOD Record, 31(4):12–17,
project will provide the Tree-of-Life initiative with a stable digital        December 2002.
image database and annotation tool set that can be used by               [15] V. Vasudevan and M. Palmer. On Web annotations:
biologists around the world.                                                  Promises and pitfalls of current Web infrastructure. 32nd
                                                                              Hawaii International Conference on Systems Sciences, Jan
7. REFERENCES                                                                 1999. possible (see Figure 1). It may extend across both
[1]    L. Alexander, A. Runyan, and V. Anderson. Taxonomic                    columns to a maximum width of 17.78 cm (7‖).
      data working group, Darwin Core 2. TDWG.org
[2] A Dingli, F Ciravegna, and Y Wilks. Autmotic semantic
      annotation using unsupervised information extraction and