=Paper=
{{Paper
|id=Vol-1933/paper-2
|storemode=property
|title=AnnoSys2: Reaching out to the Semantic Web
|pdfUrl=https://ceur-ws.org/Vol-1933/paper-2.pdf
|volume=Vol-1933
|authors=Okka Tschöpe,Lutz Suhrbier,Anton Güntsch,Walter G. Berendsohn
|dblpUrl=https://dblp.org/rec/conf/semweb/TschopeSGB17
}}
==AnnoSys2: Reaching out to the Semantic Web==
<pdf width="1500px">https://ceur-ws.org/Vol-1933/paper-2.pdf</pdf>
<pre>
         AnnoSys2: Reaching out to the Semantic Web

     Okka Tschöpe1, Lutz Suhrbier2, Anton Güntsch3 and Walter G. Berendsohn4

                          BGBM, Freie Universität Berlin, Germany
                              1
                                o.tschoepe@bgbm.org
                              2
                                l.suhrbier@bgbm.org
                              3
                                a.guentsch@bgbm.org
                              4
                                w.berendsohn@bgbm.org


       Abstract. AnnoSys is a web-based open-source system for correcting and en-
       riching specimen data in publicly available data portals, thereby bringing tradi-
       tional annotation workflows for biodiversity data to the Internet. During its first
       phase, the project developed a fully functional prototype of an annotation data
       repository for complex and cross-linked XML-standardized data, including
       back-end server functionality, web services and an on-line user interface. Anno-
       tation data are stored using the Open Annotation Data Model and an RDF-
       database. The current project phase aims at extending the generic qualities of
       AnnoSys to further structured data formats including RDF data with machine
       readable semantic concepts, thus opening up the data gathered through An-
       noSys for the Semantic Web. We developed a semantic concept-driven annota-
       tion management, including the specification of a selector concept for RDF data
       and a repository for original records extended to RDF and other formats. Since
       many of the biodiversity data standards in use are still not defined in a seman-
       tic-web compliant way, mechanisms for referencing elements in such data sets
       need to be developed. We therefore developed an AnnoSys ontology based on
       DwC RDF terms and the ABCD ontology, which deconstructs the ABCD
       XML-schema into individually addressable RDF-resources published via the
       TDWG Terms Wiki. We mapped the terms from these standards into annotation
       types we defined, based on semantic concepts.

       Keywords: AnnoSys, Ontology, Annotation.


1      Introduction

Biodiversity data are aggregated, linked and made globally accessible via a range of
Internet portals and services. Globally, natural history collections contain 2–3 billion
specimens [1]. These provide materials and primary data for a wide range of research
questions and form the basis for the classification of organisms into species and other
“taxa”. Traditionally, specimens are annotated by researchers with written annotation
labels which are applied directly to the physical object, thus becoming accessible to
succeeding observers of the specimen. These annotations improve the data quality of
the collection and document research developments over time (e.g. the understanding
of taxon concepts).
   To ensure the continuance of the traditional data sharing and incremental docu-
mentation of specimens in the on-line environment, the AnnoSys project developed
an annotation data repository [2] for complex XML data following the ABCD [3] and
DwC [4] standards. This includes back-end server functionality, web services and an
on-line user interface [5]. Annotation data are stored using the Web Annotation Data
Model [6] and an RDF-database [7].
   In a second step, AnnoSys2 aims at extending the generic qualities of AnnoSys to
further structured data formats including RDF data with machine readable explicit
semantic concepts.


2      Motivation/State of the art

Since many of the biodiversity data standards in use are still not defined in a seman-
tic-web compliant way, mechanisms for referencing elements in such data sets need to
be developed. We therefore compiled an AnnoSys ontology based on DwC RDF
terms and the ABCD ontology, which deconstructs the ABCD XML-schema into
individually addressable RDF-resources published via the TDWG Terms Wiki [8].
   One of our motivations for the new ontology was to harmonize annotatable ele-
ments to allow unambiguous comparability between different versions of a record.
For example, depending on the data publishing portal, a record can be displayed either
in the DwC or the ABCD standard. In AnnoSys 1 we were facing the problem that
those records were not directly comparable, because not all ABCD elements are part
of the DwC standard and vice versa. We therefore needed different versions histories
for different data standards (DwC, ABCD 2.06, ABCD 2.1 etc.). The AnnoSys ontol-
ogy defines matching rules describing how these different elements are transformed
into annotatable elements, resulting in harmonized records with only one, unambigu-
ously comparable version history.
   Additionally, via the different SKOS-relations equivalence levels for matches of
elements can be specified, which potentially allows restricting the use of elements to
those with a minimum level of equivalence. This may be important for data formats
that need to be integrated in the future


3      Model construction

We used Protégé [9] to build an AnnoSys ontology based on DwC terms [4] and the
ABCD ontology [8], which uses ABCD property terms as RDF predicates. We creat-
ed a subclass “RecordConcept” comprising all ontology concepts as a subclass of
skos:concept (Fig. 1). We also defined nine different “annotation types” as instances
of the SubClass “annotation type” of oa:Motivation (Fig.1, Fig. 2). Individual con-
cepts were related to the different annotation types via the skos:related relation. We
then mapped the elements of the two standards to semantic concepts using the
skos:Concepts exactMatch, broadMatch, narrowMatch, or closeMatch, respectively,
to represent the different levels of matches (Table 1).


                 Fig. 1. Subclasses of skos:Concept in the AnnoSys Ontology.


Concepts that refer to identifiers of the institution, the collection or the unit, are not
related to an annotation type but are also instances of the subclass “Record Concept”.
These concepts are not annotatable, but are important in their function as identifiers
(e.g. to query for records related to a given triple id – the identifier originally used in
schemas describing specimens, composed of three ids designating the holding institu-
tion, a collection within the institution, and the catalogue number within that collec-
tion).


 Fig. 2. “Annotation type” is a subclass of oa:Motivation, which is a subclass of skos:concept.
 Table 1. Example concepts of the AnnoSys Ontology and their mappings for annotation type
                                     “Determination”

Concept in    Skos exact       Skos close        Narrow match            Skos related
AnnoSys       match            match
ontology
                               abcd2:TaxonIden                           Annotation type:
                               tified-                                   Determination
Full scien-   Dwc:Scientific
                               FullScientific-
tific name    Name
                               NameString

                                                 abcd2:TaxonIdentified- Annotation type:
                                                 AuthorTeam             Determination
                                                 abcd2:TaxonIdentified-
Scientific    dwc:Scientific                     AuthorTeamAndYear
Name Au-      NameAuthor-                        abcd2:TaxonIdentified-
thorship      ship                               AuthorTeamOrigi-
                                                 nalAndYear


                                                 abcd2:TaxonIdentified- Annotation type:
                                                 ParentheticalAuthor-   Determination
                                                 TeamAndYear
Scientific
                                                 abcd2:TaxonIdentified-
Name Au-
                                                 AuthorTeamParenthe-
thorship
                                                 sis
Parenthe-
                                                 abcd2:TaxonIdentified-
tical
                                                 AuthorTeamParenthe-
                                                 sisAndYear


   A prototype of the system is available under https://dev-annosys.bgbm.fu-
berlin.de/AnnoSys/AnnoSys.


4       Evaluation

The ontology is composed of around 150 data properties that are related to nine anno-
tation types. Since concepts are now defined in a semantic-web compliant way, they
can be stored together with the record in the same triple store (whereas in AnnoSys 1,
records have been stored in an XML database). This allows more complex searches
and significantly improves the performance of the system. AnnoSys data properties
cover the classic annotation workflows in the biodiversity collection data domain.
However, the ontology is potentially expandable for other workflows and other do-
mains.
   When aiming to integrate annotations for specimens from different data portals, it
is essential to be able to identify annotated specimens universally. Therefore, An-
noSys 2 builds persistent identifiers for all objects (records, specimens and annota-
tions) from UUIDs, making the system independent of the previously used tripleIds.


5      Conclusion

Our work tackles the development of an extensible and format-independent system
for virtual annotation of biological specimen label data. To this end, we compiled an
"AnnoSys-Ontology" mapping essential concepts defined by the widely accepted
community standards DarwinCore and ABCD. Annotations are entered via an open
browser interface and stored centrally in an RDF triple store following the W3C Web
Annotation Data Model.
   The system is currently in the testing phase and will be released in 2018. In future
research, we will examine the use of AnnoSys for taxon-level data as well as its inte-
gration with image annotation systems.


References
 1. Duckworth, W.D., Genoways, H.H., Rose, C.L. et al.: Preserving Natural Science Collec-
    tions: Chronicle of Our Environmental Heritage. National Institute for the Conservation of
    Cultural Property, Washington, DC. (1993)
 2. AnnoSys portal, https://annosys.bgbm.fu-berlin.de/AnnoSys/AnnoSys, last accessed
    2017/07/18
 3. Berendsohn W.G. (ed.). Access to biological collection data. ABCD Schema 2.06 – rati-
    fied TDWG Standard. Berlin: Botanischer Garten und Botanisches Museum Berlin-
    Dahlem (BGBM), Freie Universität Berlin. (2007)
      http://www.bgbm.org/TDWG/CODATA/Schema/default.htm.
 4. DwC terms homepage, http://rs.tdwg.org/dwc/terms/index.htm, last accessed 2017/07/18
 5. Tschöpe, O., Macklin, J.A., Morris, R.A. et al. Annotating biodiversity data via the Inter-
    net. Taxon, 62, 1248–1258 (2013)
 6. Web Annotation Data Model homepage, https://www.w3.org/TR/annotation-model/, last
    accessed 2017/07/18
 7. Suhrbier, L., Kusber, W.-H., Tschöpe, O., Güntsch, A. & Berendsohn, W. G.: AnnoSys -
    implementation of a generic annotation system for schema-based data using the example
    of biodiversity collection data. Database (2017). doi:10.1093/database/bax018
 8. ABCD2 homepage, https://terms.tdwg.org/wiki/ABCD_2, last accessed 2017/09/07
 9. Protege homepage, http://protege.stanford.edu/products.php, last accessed 2017/07/18

</pre>