<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Representing NCBO Annotator results in standard RDF with the Annotation Ontology</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Soumia Melzi</string-name>
          <email>soumia.melzi@lirmm.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Clement Jonquet</string-name>
          <email>jonquet@lirmm.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Laboratory of Informatics, Robotics and Microelectronics of Montpellier (LIRMM) &amp; Computational Biology Institute (IBC) of Montpellier University of Montpellier</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Semantic annotation is part of the Semantic Web vision. The Annotation Ontology is a model that have been proposed to represent any annotations in standard RDF. The NCBO Annotator Web service is a broadly used service for annotations in the biomedical domain, offered within the BioPortal platform and giving access to more than 350+ ontologies. This paper presents a new output format to represent the NCBO Annotator results in RDF with the Annotation Ontology. We briefly present both technologies and describe the mappings to enable the representation. A Java library is available to parse the current JSON outputs to RDF/XML format. By rendering results in RDF, we make the annotations generated by the NCBO Annotator follow the Semantic Web standards making possible among other things to offer them as linked data.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic Web</kwd>
        <kwd>biomedical ontologies</kwd>
        <kwd>semantic annotation</kwd>
        <kwd>NCBO Annotator</kwd>
        <kwd>RDF</kwd>
        <kwd>linked-data</kwd>
        <kwd>Annotation Ontology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In the Semantic Web vision, semantic annotations enable data to be represented and
tagged with ontologies thus facilitating data integration, interoperability, indexing and
search [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In order to avoid the paradox where the variety of annotation tools and
formats will become as large as the data to annotate, we need to develop common models
for representing and sharing semantic annotations independently of the tools or experts
that have generated them. The Annotation Ontology (AO) is such a model allowing to
represent any annotations in standard RDF [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In the following, we present how we
use AO to represent biomedical semantic annotations returned by the NCBO
Annotator. Similar contribution has already been offered inside the DOMEO annotation tool
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], however our approach differs as it provides a (Java) library that could be plugged to
any annotation postprocessing process. In addition, we explicitly describe the mappings
(Table 1) also allowing anyone to extend the library with outputs in other format such
as JSON-LD or other RDF syntaxes.
more than 350+ biomedical ontologies. The annotation workflow is based on a highly
efficient syntactic concept recognition tool (using concept names and synonyms) and
on a set of semantic expansion algorithms that leverage the semantics in ontologies
(e.g., is-a relations and mappings). The Annotator is parameterizable to customize
the annotation process (ontologies to include, use of semantic expansion, stopwords,
longest match only, etc.) and when used as a web service through the REST API
(http://data.bioontology.org/documentation#nav_annotator), the service returns JSON
or XML outputs. The outputs do not include anything about the data being annotated
(the inputs) nor the parameters, but it does offers for each annotating concept the
following metadata:
– Description of the annotating concept (annotatedClass) with its URI, and
references to its description, ontology, children, parents, descendants, ancestors, tree,
notes, mappings and UI link in BioPortal;
– If applicable, description of the parents concepts (hierarchy) also annotating
the data, with information about the ancestor level;
– If applicable, description of the mapped concepts (mappings) also annotating the
data, with information about the inter-ontology mapping;
– The set of terms in the text data that have generated the annotations with for
each: the character position within the text (from &amp; to), the type of match (preferred
name or synonym) and the exact piece of text that has been matched.
      </p>
      <p>
        The Annotation Ontology is a OWL ontology (http://purl.org/ao/) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] proposed by
Harvard Medical School researchers to represent any kind of semantic annotations
either generated by automatic tool or by human experts, about any kind of data (text,
image, video, etc.). It offers provenance information as well as metadata about the
annotations. AO helps to address the need for open standards to index and represent
scientific data as linked data within the Semantic Web. It is a vocabulary, originally
inspired from the Annotea initiative beginning of the 2000’s [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], that actually reuses
several defacto standards e.g., for provenance (PAV), metadata (Dublin Core), peoples
(FOAF), communities (SIOC). Figure 2 shows and example of annotation represented
using AO. We can notice different parts such as: (i) the annotation provenance, which
includes the information about the tool or person that has created the annotation and
when; (ii) the document provenance, which includes information about the data being
annotated; (iii) the annotation topic, which represents the annotating concept with its
URI; (iv) the annotation type and (v) the annotation selector as described after.
      </p>
      <p>AO defines several types of selectors to identify part of the resource being annotated
(part of a text document, section of an image, audio excerpt, etc.). In the following, we
will use the type aos:TextSelector which allows to identify part of a text
document either by: (i) specifying the number of characters from the beginning of the
document to the part being annotated and the offset (OffsetRangeSelector) and (ii) by
specifying a short prefix and postfix phrase (PrefixPostfixSelector), case illustrated in
Figure 2. AO defines several types of annotations such as note, errata, example,
definition or SKOS like qualifiers and it is possible to subclass or combine those main types.
Especially, qualifiers are used when explicitly annotating with an RDF resource (with
an URI) not just a tag. Figure 2 illustrates an ExactQualifier generally used when
the object of the relationship ao:hasTopic is representing exactly the portion of the
annotated document.
3</p>
    </sec>
    <sec id="sec-2">
      <title>NCBO Annotator results represented with AO</title>
      <p>The Annotator only deals with free text data for the moment, therefore we have used
the aos:OffsetRangeSelector to describe the context of annotations. Plus, we
have used: ExactQualifier for direct annotations made with a preferred name and
Qualifier otherwise.1 Table 1 list the other alignements between the AO ontology
and the Annotator annotation format.</p>
      <p>Because of missing information about the input data, we had to populate ourselves
the aof:annotatesDocument and ao:onSourceDocument properties with an
automatic generated id concatenated to the pav:createdBy property value. Figure 3
shows the RDF outputs generated for the annotation with Neoplasms used previously.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>In this paper we have presented our approach to offer the NCBO Annotator results in
RDF represented with a reference ontology for annotation of scientific data: the
Annotation Ontology. In addition to the adoption of a standard RDF format facilitating the
release of annotations as linked data, users have also access to all Semantic Web
technologies to display or query their annotations. Considering the large number of
annotations generated by different annotation tools, having them in RDF allows to query them
semantically using for instance SPARQL or OWL descriptions. The parser to
transform JSON to RDF outputs is available as a Java library that simply takes a JSON file
1 In the next version of the parser, we will use ExactQualifier for both preferred
name and synonym matches, BroadQualifier for is-a hierarchy annotations and
CloseQualifier for mapping-based annotations.
returned by the Annotator and produces a RDF/XML file. This Java library is only
available on request for now, but will be released in 2015. Our long term perspective, within
the Semantic Indexing of French Biomedical Data Resources (SIFR) project (http:
//www.lirmm.fr/sifr) is to offer a service endpoint implementing several improvements
(scoring, negation, disambiguation, new semantic expansion, new outputs formats, etc.)
of the NCBO Annotator done with pre and post processing while still calling the
Annotator service. Future versions will also include JSON-LD format and we will also follow
the Open Annotation Data Model (http://www.openannotation.org/spec/core/ which is
currently a W3C draft.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>This work was supported in part by the French National Research Agency under JCJC program, grant ANR-12-JS02-01001,
as well as by University of Montpellier, CNRS and the Computational Biology Institute (IBC) of Montpellier. We thanks the
National Center for Biomedical Ontology (NCBO) for latest information about the Annotator.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ciccarese</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ocana</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castro</surname>
            ,
            <given-names>L.J.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>An open annotation ontology for science on web 3.0</article-title>
          .
          <issue>Biomedical Semantics 2</issue>
          (
          <issue>2</issue>
          :S4) (May
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ciccarese</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ocana</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Open semantic annotation of scientific publications using DOMEO</article-title>
          .
          <source>Biomedical Semantics</source>
          <volume>3</volume>
          (
          <issue>S1</issue>
          ) (
          <year>April 2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Handschuh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staab</surname>
          </string-name>
          , S. (eds.):
          <article-title>Annotation for the Semantic Web</article-title>
          ,
          <source>Frontiers in Artificial Intelligence and Applications</source>
          , vol.
          <volume>96</volume>
          . IOS Press (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Jonquet</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>N.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Musen</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>The Open Biomedical Annotator</article-title>
          . In: American Medical Informatics Association Symposium on Translational BioInformatics, AMIA-TBI'
          <volume>09</volume>
          . pp.
          <fpage>56</fpage>
          -
          <lpage>60</lpage>
          . San Francisco, CA, USA (March
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kahan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koivunen</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prud'Hommeaux</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swick</surname>
            ,
            <given-names>R.R.</given-names>
          </string-name>
          :
          <article-title>Annotea: an open RDF infrastructure for shared Web annotations</article-title>
          .
          <source>In: 10th Int.ernational World Wide Web conference, WWW'01</source>
          . pp.
          <fpage>623</fpage>
          -
          <lpage>632</lpage>
          . Hong Kong (May
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>N.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Whetzel</surname>
            ,
            <given-names>P.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dorf</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Griffith</surname>
            ,
            <given-names>N.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jonquet</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rubin</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Storey</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chute</surname>
            ,
            <given-names>C.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Musen</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>BioPortal: ontologies and integrated data resources at the click of a mouse</article-title>
          .
          <source>Nucleic Acids Research</source>
          <volume>37</volume>
          ,
          <fpage>170</fpage>
          -
          <lpage>173</lpage>
          (May
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>