<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploring nanopublishing with COEUS</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pedro Lopes</string-name>
          <email>pedrolopes@ua.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pedro Sernadela</string-name>
          <email>sernadela@ua.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José Luís Oliveira</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DETI/IEETA, University of Aveiro</institution>
          ,
          <addr-line>Aveiro</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A nanopublication represents the smallest unit of publishable information. This schema enhances attribution and ownership of specific data elements. With these guidelines for relating atomic data with its authors, accessing and exchanging knowledge becomes a more streamlined process. Nanopublications are particularly relevant in the scientific domain, where scientific publication, validation and ownership of data are essential. The COEUS semantic web application framework delivers, in a single package, all the tools required to rapidly build a new semantic knowledge base from scratch, including multiple data integration algorithms and interoperability services. This work introduces the combination of COEUS' integration and interoperability features with the nanopublications standard. This results in a unique nanopublishing pipeline, where collections of annotated data can be modeled and integrated, stored in a semantic knowledge base, and published through COEUS API. These improvements to the COEUS framework greatly benefit the scientific community, where creating and publishing nanopublications is still cumbersome.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic Web</kwd>
        <kwd>nanopublications</kwd>
        <kwd>data integration</kwd>
        <kwd>application framework</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The Semantic Web [1] paradigm introduces multiple technologies and strategies
that are a perfect fit to represent real-world relationships in digital information
systems, namely in the life sciences. Semantic Web standards tackle challenges in the
most diverse domains, from data heterogeneity to service interoperability (or lack
thereof) [2]. The cornerstone of this flexibility is the concept of nanopublications, a
simple micro attribution strategy enabling the creation of machine-readable
knowledge assertions, empowering a new structure level for the huge amounts of
information flooding the scientific field [3, 4].</p>
      <p>With this standard still its infancy, new tools are required to streamline the
nanopublications generation and publishing process. Nowadays, this process is still
manual, based on ad-hoc tools tailored to niche use cases.</p>
      <p>This work introduces a strategy that exploits COEUS' [5] features to streamline the
creation, storage and publishing of nanopublications. This new pipeline starts with a
translation process, integrating data from existing datasets into a semantic knowledge
base, and modeling it according to the nanopublications standard. Aggregated
knowledge is made available through COEUS' API, including REST services, a
SPARQL endpoint, LinkedData interfaces and a nanopublications URI interface.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        Migrating systems to a Semantic Web environment is no different than the
transition to previous paradigms. New technologies, algorithms and development strategies
are introduced, making this transition a cumbersome task. The COEUS framework
was built to overcome these challenges. The COEUS platform improves four key
features in the development of new Semantic Web applications: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) the transition
from primitive to semantically enhanced systems; (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) the integration and triplification
of data; (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) the sharing of knowledge through interoperable interfaces; (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) the
deployment of a knowledge federation layer.
      </p>
      <p>COEUS' flexible integration engine improves traditional data warehousing
ExtractTransform-Load tasks, enabling the acquisition of data from heterogeneous resources
(in CSV, JSON, XML, SQL, SPARQL, RDF and LinkedData) and its translation to a
semantic data abstraction. The latter organizes knowledge in a cohesive structure,
based on Entity-Concept-Item relationships. COEUS' API comprises various methods
to access data (REST services, SPARQL endpoint, LinkedData interface, Java
methods), making them easily available for querying and integration in external systems.</p>
      <p>Nanopublications expand existing Semantic Web strategies to standardize how one
can attribute provenance, authorship, publication information and further
relationships, always with the intention to stimulate information reuse. In a sense,
nanopublications are a natural response to the exploding number and complexity behind
scientific data. With this standard, we can summarize published knowledge to a set of
thoroughly individualized list of assertions - the nanopublication.</p>
      <p>In summary, nanopublications are composed of three sections, each detailing
assertion information, authorship and provenance, for elaborate knowledge statements [6]
– Figure 1. Nanopublications are serializable through the interoperable RDF format,</p>
      <sec id="sec-2-1">
        <title>Subject</title>
      </sec>
      <sec id="sec-2-2">
        <title>Predicate</title>
      </sec>
      <sec id="sec-2-3">
        <title>Object</title>
      </sec>
      <sec id="sec-2-4">
        <title>Nanopublication ID</title>
      </sec>
      <sec id="sec-2-5">
        <title>Assertion</title>
      </sec>
      <sec id="sec-2-6">
        <title>Provenance</title>
      </sec>
      <sec id="sec-2-7">
        <title>Publication Info</title>
        <p>opening the door to many new knowledge exchange possibilities and fostering their
retrieval and use. Moreover, with universal nanopublications identifiers, each
nanopublication can be cited and their impact tracked, encouraging compliance with
open semantic web standards. In addition to normalized positive assertions, Semantic
Web's expressiveness can also be leveraged to expose negative knowledge assertions
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methods</title>
      <p>
        The rational behind this work is to extend the base COEUS framework with
support for nanopublications. This will enable a new nanopublishing pipeline where all
tasks are automated. This process includes three general steps: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) configuring the
data abstraction to a semantic model - Figure 2-1; (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) integrating &amp; translating data
from external resources into the internal knowledge base - Figure 2-2; (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) sharing the
nanopublications dataset - Figure 2-3. Adding nanopublishing support to COEUS
leverages on its flexibility.
      </p>
      <p>Before the actual integration process, we need to setup where the data comes from
and how it will be represented according to the nanopublications format. COEUS
setup allows organizing data according to a predefined hierarchy, based on an
EntityConcept-Item structure. With the nanopublication extension, two new properties were
added to COEUS’ configuration to enable the automated creation of nanopublications.
COEUS integration engine was updated to detect these settings and proceed
accordingly. The setup configuration changes are detailed next.
• coeus:isNanopublication. This predicate can be applied to the Concept
configuration metadata, defining the base data imports for creating new nanopublications.
Where this property is enabled, COEUS’ integration engine automatically
generates new nanopublications and their respective URIs autonomously.
• coeus:np_element. This predicate can be applied to the Resource configuration</p>
      <p>COEUS</p>
      <p>Knowledge Base
1</p>
      <p>COEUS
Configuration</p>
      <p>2</p>
      <p>Knowledge
Integration &amp; Translation</p>
      <p>3
Nanopublications</p>
      <p>Access</p>
      <p>metadata (for external resources associated with a nanopublication Concept),
defining the type of data being loaded: a new assertion, provenance or publication
information object.</p>
      <p>In addition to these nanopublications-specific elements, the already existing dynamic
properties can still be used to add any predicate from any ontology to the newly
generated nanopublications.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>Nanopublications arise as a new strategy to cope with knowledge provenance,
ownership and sharing issues. The standard enables publishing comprehensive
datasets, featuring large datasets, as a collection of rich individual assertions.</p>
      <p>This work introduces an innovative nanopublishing pipeline. By extending
COEUS, we enable a new semantic web framework to flexibly generate
nanopublications dataset. These results streamline the translation of data in primitive formats to a
semantic environment and, consequently, its delivery through open web interfaces.
Furthermore, COEUS is now a “turn-key” nanopublishing solution, making the
nanopublications dataset generation process much more agile.</p>
      <p>Acknowledgments. The research leading to these results has received funding
from the European Community (FP7/2007-2013) under ref. no. 305444 – the
RD-Connect project, and from the QREN "MaisCentro" program, ref.
CENTRO-07-ST24-FEDER-00203 – the CloudThinking project.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lassila</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>The Semantic Web</article-title>
          .
          <source>Sci Am</source>
          <volume>284</volume>
          ,
          <fpage>34</fpage>
          -
          <lpage>43</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ruttenberg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bug</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Samwald</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bodenreider</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doherty</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Forsberg</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kashyap</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kinoshita</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luciano</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marshall</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ogbuji</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rees</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stephens</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>G.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaccagnini</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hongsermeier</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herman</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cheung</surname>
            ,
            <given-names>K.-H.H.</given-names>
          </string-name>
          :
          <article-title>Advancing translational research with the Semantic Web</article-title>
          .
          <source>BMC Bioinformatics</source>
          <volume>8</volume>
          ,
          <fpage>S2</fpage>
          -
          <lpage>S2</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Velterop</surname>
          </string-name>
          , J.:
          <article-title>Nanopublications*: the future of coping with information overload</article-title>
          .
          <source>LOGOS: The Journal of the World Book Community 21</source>
          ,
          <fpage>3</fpage>
          -
          <lpage>4</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Mons</surname>
            , B., van Haagen,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chichester</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>den Dunnen</surname>
          </string-name>
          , J.T., van Ommen, G., van
          <string-name>
            <surname>Mulligen</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hooft</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roos</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hammond</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The value of data</article-title>
          .
          <source>Nature genetics 43</source>
          ,
          <fpage>281</fpage>
          -
          <lpage>283</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lopes</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oliveira</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          :
          <article-title>COEUS:“semantic web in a box” for biomedical applications</article-title>
          .
          <source>Journal of Biomedical Semantics</source>
          <volume>3</volume>
          ,
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gibson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velterop</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The anatomy of a nanopublication</article-title>
          .
          <source>Information Services and Use</source>
          <volume>30</volume>
          ,
          <fpage>51</fpage>
          -
          <lpage>56</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>