<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploring the Generation and Integration of Publishable Scientific Facts Using the Concept of Nano-publications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Amanda Clare</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samuel Croset</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>(croset@ebi.ac.uk)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christoph Grabmueller</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Senay Kafkas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Liakata</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anika Oellrich</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dietrich Rebholz-Schuhmann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>European Bioinformatics Institute</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Aberystwyth</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Publication formats are being sought that facilitate automatic processing and knowledge integration and are better suited to the current pace of research. Here we present an infrastructure for producing and consuming minimal publishable units, nano-publications, directly from a researcher's electronic notes or manuscripts which allow the integration of multiple resources. We describe a feedback loop resulting from the use of nano-publications, give a detailed example, and explain how this can be combined with existing web technologies.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        as blogs and wikis. Wikis are collaborative tools, enabling users to collect, share
and edit information while blogs are incremental content management systems
enabling rapid publication of information. In traditional wiki systems and blogs,
accessing, querying, retrieving and aggregating data is difficult since the
knowledge is represented in unstructured form. These issues induced the emergence of
semantic wiki systems like Semantic MediaWiki and DBpedia and commercial
or semi-commercial web content providers (e.g. Apture). However, such tools
are currently not aimed at scientists and do not offer the precision and level of
detail that scientists need to make their work unambiguous and available in a
machine-readable, reusable form to others. Importantly, it is not easy to receive
credit for statements on wikis or blogs, or to cite the information therein as
one would do with a standard publication. The nano-publication (NP) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] has
been proposed as a new form of academic publishing. Unlike other initiatives for
linking shared statements in the literature [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], a NP is defined as a citable unit
containing a set of annotated statements which capture knowledge in the form
of Resource Description Framework (RDF) triples, representing three concepts
(subject, predicate, object) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The RDF graph emerging from the triples can
be identified with a name, a procedure which was coined as a ‘named graph’
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Despite the establishment of concepts such as NPs and named graphs, no
unique way has been identified to facilitate the integration of all possible ways
of publishing in a fast and reliable way. Here, we demonstrate how our
interpretation of NPs, named graphs, knowledge resources and existing web tools can
be combined to facilitate the integration of the diverse types of publishing and
potentially lead to the discovery of new knowledge.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Practical Example of NPs: Feedback Looping</title>
      <p>
        In this paper, we explore the generation and use of NPs [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] by means of a concrete
example. We define a NP as a set of one or more RDF statements which assert
some knowledge in the field of expertise of the person publishing, or demonstrate
an endorsement of a statement by the latter. While the RDF annotations may be
automatically obtained using text mining methods, they will have been manually
approved and collected in a set, constituting a new object, by the author of the
NP. In addition, it is important that a NP can be properly cited and its origin
(provenance) traced. Therefore, a NP should include an identifier, e.g, digital
object identifier (DOI), while provenance should include the author of the NP,
and the origin of each of the statements included therein. Based on the above,
we propose a model for NPs, which is summarised in Figure 1.
      </p>
      <p>The model is cyclic and based on dynamic interaction between users and the
machine. It consists of three steps, focusing on benefits for creators and users of
NPs. Firstly, researchers create the NP and offer it to the broader community (see
Figure 2). The second step involves machine consumption of the data generated
by users. Machines can integrate data from multiple sources as long as they are
represented in a common format with explicit semantics (RDF). For example,
statements generated by a user can be integrated with statements coming from</p>
      <p>UniprotKB7 or existing literature. Computers can also combine related NPs and
generate new hypotheses. The third step of the model is the user’s reward: feeds
generated from the data integration done by the machine in the second step. The
author of the NP will receive relevant information, based on previously published
assertions processed by the machine. The hypotheses can be evaluated, rejected
or validated by data added by the user, leading to a new NP, as described in
step one. The dynamic human-computer interaction allows NP writers to access
relevant information tailored to individualized retrieval, which they can enrich
for the benefit of the community. As NPs are uniquely identified by a DOI, they
can be cited and further used in any type of publication.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Working Example of NPs Using Semantic Wikis</title>
      <p>
        We propose that the route to a scientific NP can be facilitated by enabling
the annotation of scientific notes or blogs at multiple levels of detail. The
author of the notes can then package together aspects of their notes as a scientific
publication. We have created a prototype of a tool for the open source wiki
MediaWiki, allowing the user to manually annotate a scientific document with
automated markup; multiple tags are allowed at the sentence level. This tool is
aimed at scientists who use MediaWiki as an electronic lab notebook
environment. Scientists could post a set of their annotated sentences as a NP. Additional
annotation (manual or automated) of important entities and relations between
entities can be provided for terms within these sentences. These can then be
saved as triples linked to the NP. A mockup screenshot of entity-level markup
can be found in Figure 3. The prototype of our tool can also model the scientific
7 http://www.uniprot.org/help/uniprotkb
discourse of the document or notes in terms of CoreSC [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], an annotation scheme
successfully used to automatically recognise core scientific concepts in research
articles. Thus we can retrieve the semantic context from which a NP has been
generated (Result, Conclusion, Hypothesis, etc.). Once generated, the RDF form
of the NP can then be exported and hosted on an external RDF hosting site.
Figure 2 demonstrates this process.
      </p>
      <p>The following example illustrates the format of NPs including a simple
statement (Figure 4)8. The illustration is extended to show how a new hypothesis
can be generated from the user’s input.</p>
      <p>A researcher derives the
conclusion: “isoproterenol binds
to the Alpha-2 adrenergic
receptor” and decides to
create a NP
corresponding to the conclusion. In
the semantically enriched
statement, isoproterenol is
Fig. 4. RDF representation of exemplary conclusion. mapped to the ChEBI
ontology (ChEBI)9, the receptor
to UniprotKB, and the action binds to one of the terms of the NCI Thesaurus10.
Automated URI mappings can be achieved with services like the NCBO
BioPortal11. Once the data has been transformed to RDF, the NP is available for
further processing and integration into the network of linked data.</p>
      <p>The example in Figure 5 illustrates how a
generic query can be generated from the
NP presented above. The query aims to
retrieve other chemicals binding the
recepFig. 5. Query statement. tor and is run over a knowledge base.</p>
      <p>The result is a list of known chemicals
binding P08913. The adrenaline molecule
(chebi:CHEBI 33568) will appear in the list and could be reported to the user.
Applying a reasoner will reveal that adrenaline and isoproterenol are both
members of the catecholamine family (chebi:CHEBI 33567). From this observation
and in absence of any other information present in the databases, the following
8 For provenance terminology and concepts we follow:</p>
      <p>http://trdf.sourceforge.net/provenance/ns.html
9 http://www.ebi.ac.uk/chebi/
10 http://ncit.nci.nih.gov/
11 http://bioportal.bioontology.org/
hypothesis can be sent to the creator of the NP: “Do compounds from the
catecholamine family bind the P08913 receptor?”. The newly formed hypothesis can
give rise to new experiments and be published in return (Figure 6), citing both
the NP above and ChEBI, which are the basis of the hypothesis.</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>We have described an infrastructure for creating NPs using web tools such as
blogs and wikis, while integrating information from a number of external
resources. We also demonstrated how the produced NPs can be extended via
the integration of information from other knowledge resources through querying
across available resources on a semantic layer. A researcher using this system can
decide whether or not to confirm the result of a query resulting from their initial
publication and publish it as a new NP. Enabling this strategy of publishing will
not only facilitate the integration of diverse resources but also allow for fast and
precise knowledge dissemination and retrieval.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Attwood</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          , et al.:
          <article-title>Utopia documents: linking scholarly literature with research data</article-title>
          .
          <source>Bioinformatics</source>
          <volume>26</volume>
          (
          <issue>18</issue>
          ),
          <fpage>i568</fpage>
          -
          <lpage>74</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Carroll</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          , et al.:
          <article-title>Named graphs, provenance and trust</article-title>
          .
          <source>International World Wide Web Conference</source>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gerstein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al.:
          <article-title>Structured digital abstract makes text mining easy</article-title>
          .
          <source>Nature</source>
          <volume>447</volume>
          (
          <issue>7141</issue>
          ),
          <volume>142</volume>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gibson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stickler</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>The anatomy of a nanopublication</article-title>
          .
          <source>Information Services and Use</source>
          <volume>30</volume>
          (
          <issue>1</issue>
          ),
          <fpage>51</fpage>
          -
          <lpage>56</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Liakata</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al.:
          <article-title>Corpora for the conceptualisation and zoning of scientific papers</article-title>
          .
          <source>In: International Conference on Language Resources and Evaluation</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mons</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velterop</surname>
          </string-name>
          , J.:
          <article-title>Nano-publication in the e-Science era</article-title>
          .
          <source>Workshop on Semantic Web Applications in Scientific Discourse (Jan</source>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Passant</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciccarese</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Breslin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>SWAN/SIOC: aligning scientific discourse representation and social semantics</article-title>
          . Workshop on Semantic Web Applications in Scientific Discourse (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>