<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Proceedings of the 4th Workshop on LISC2014 Linked Science | Making Sense Out of Data</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>(ISWC2014) Riva del Garda</institution>
          ,
          <addr-line>Trentino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Jun Zhao</institution>
          ,
          <addr-line>Marieke van Erp, Carsten Ke ler, Tomi Kauppinen, Jacco van Ossenbruggen, Willem Robert van Hage</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>38</fpage>
      <lpage>79</lpage>
      <kwd-group>
        <kwd>Collocated with the 13th International Semantic Web Conference</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Editors:</p>
    </sec>
    <sec id="sec-2">
      <title>Preface</title>
      <sec id="sec-2-1">
        <title>Preface</title>
        <p>Traditionally scientific dissemination has been relying heavily on publications and
presentations. The findings reported in these articles are often backed by large amounts of diverse data
produced by complex experiments, computer simulations, and observations of physical
phenomena. Although publications, methods and datasets are often related, due to this avalanche of
data it remains extremely hard to correlate, reuse and leverage scientific data. Semantic Web
technologies provide a promising means for publishing, sharing, and interlinking data to
facilitate data reuse and the necessary correlation, integration, and synthesis of data across levels
of theory, techniques and disciplines. However, even when these data become discoverable and
accessible, significant challenges remain in making intelligent understandings of these data and
scientific discoveries that we anticipated.</p>
        <p>Our past three series (LISC2011, LISC2012 and LISC2013) have seen many novel ideas of
using Semantic Web technologies for integrating scientific data (for example about real
experiments or from simulations), or enabling reproducibility of research via online tools and Linked
Data. The theme for LISC2014 is “Making Sense out of Data Through Linked Science”. Here
we focus on new ways of discovering interesting patterns from scientific data, which could lead
to research validation or identification of new hypotheses and acceleration of the scientific
research cycle. We target both new results through making use of semantic reasoning or making
innovative combination of existing technologies (such as visualization, data mining, machine
learning, and natural language processing) with SW technologies to enable better
understanding of data. One goal is to create both an incentive for scientists to consider the Linked Science
approach for their scientific data management and an incentive for technologists from different
disciplines to work together towards the vision of powering science with technologies.</p>
        <p>LISC2014 was hosted at the 13th International Semantic Web Conference (ISWC2014), in
Riva del Garda, Trentino, Italy. Twenty-seven attendees enjoyed the opening keynote “Making
more sense out of social data” by Harith Alani (KMI, the Open University, UK), followed by
excellent presentations of the eight regular papers collected in these proceedings. We continued
the tradition of a “working” workshop with a plenary discussion on the challenges and
opportunities of using Semantic Web technologies for sense making. The results of this discussion is
published at FigShare, and can be cited as:</p>
        <p>Overall, this edition continued providing a successful forum for discussing how semantic web
technologies and linked data can help science. We wanted to thank the entire program
committee for helping to assemble the program and the attendees for their enthusiastic participation.
The LISC 2014 Co-organizers:</p>
        <p>Jun Zhao
Marieke van Erp</p>
        <p>Carsten Keßler</p>
        <p>Tomi Kauppinen
Jacco van Ossenbruggen
Willem Robert van Hage
Boyan Brodaric
Arne Broering
Paolo Ciccarese
Oscar Corcho
Aba-Sah Dadzie
Stefan Dietze
Mathieu Daquin
Daniel Garijo
Alasdair Gray
Paul Groth
Rinke Hoekstra
Krzysztof Janowicz</p>
      </sec>
      <sec id="sec-2-2">
        <title>Program Committee</title>
        <p>Simon Jupp
Tomi Kauppinen
Carsten Keßler
James Malone
Edgard Marx (additional reviewer)
Jeff Pan
Heiko Paulheim
Marieke van Erp
Willem van Hage
Jacco van Ossenbruggen
Amrapali Zaveri
Jun Zhao
1
2
3
4
5
6
7
8</p>
        <sec id="sec-2-2-1">
          <title>EPUB3 for Integrated and Customizable Representation of a Scienti c</title>
        </sec>
        <sec id="sec-2-2-2">
          <title>Publication and its Associated Resources</title>
          <p>Hajar Ghaem Sigarchian, Ben De Meester, Tom De Nies, Ruben
Verborgh, Wesley De Neve, Erik Mannens, Rik Van de Walle</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>Semantic Lenses to Bring Digital and Semantic Publishing Together</title>
          <p>Angelo Di Iorio, Silvio Peroni, Fabio Vitali, Jacopo Zingoni</p>
        </sec>
        <sec id="sec-2-2-4">
          <title>Clustering Citation Distributions for Semantic Categorization and Citation Prediction</title>
          <p>Francesco Osborne, Silvio Peroni, Enrico Motta</p>
        </sec>
        <sec id="sec-2-2-5">
          <title>SMART Protocols: SeMAntic RepresenTation for Experimental Protocols</title>
          <p>Olga Giraldo, Alexander Garcia, Oscar Corcho</p>
        </sec>
        <sec id="sec-2-2-6">
          <title>LinkedPPI: Enabling Intuitive, Integrative Protein-Protein Interaction</title>
        </sec>
        <sec id="sec-2-2-7">
          <title>Discovery</title>
          <p>Laleh Kazemzadeh, Maulik R. Kamdar, Oya D. Beyan, Stefan Decker,
Frank Barry</p>
        </sec>
        <sec id="sec-2-2-8">
          <title>Using the Micropublications Ontology and the Open Annotation Data</title>
        </sec>
        <sec id="sec-2-2-9">
          <title>Model to Represent Evidence within a Drug-Drug Interaction Knowledge Base</title>
          <p>Jodi Schneider, Paolo Ciccarese, Tim Clark, Richard D. Boyce</p>
        </sec>
        <sec id="sec-2-2-10">
          <title>Capturing Provenance for a Linkset of Convenience</title>
          <p>Simon Jupp, James Malone, Alasdair J. G. Gray</p>
        </sec>
        <sec id="sec-2-2-11">
          <title>Connecting Science Data Using Semantics and Information Extraction</title>
          <p>Evan W. Patton, Deborah L. McGuinness
1
12
24
36
48
60
71
76</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>EPUB3 for Integrated and Customizable</title>
    </sec>
    <sec id="sec-4">
      <title>Representation of a Scientific Publication and its</title>
    </sec>
    <sec id="sec-5">
      <title>Associated Resources</title>
      <p>Hajar Ghaem Sigarchian1, Ben De Meester1, Tom De Nies1, Ruben Verborgh1,
Wesley De Neve1,2, Erik Mannens1, and Rik Van de Walle1
Abstract. Scientific publications point to many associated resources,
including videos, prototypes, slides, and datasets. However,
discovering and accessing these resources is not always straightforward: links
could be broken, readers may be offline, or the number of associated
resources might make it difficult to keep track of the viewing order. In
this paper, we explore potential integration of such resources into the
digital version of a scientific publication. Specifically, we evaluate the
most common scientific publication formats in terms of their capability
to implement the desirable attributes of an enhanced publication and to
meet the functional goals of an enhanced publication information
system: PDF, HTML, EPUB2, and EPUB3. In addition, we present an
EPUB3 version of an exemplary publication in the field of computer
science, integrating and interlinking an explanatory video and an
interactive prototype. Finally, we introduce a demonstrator that is capable
of outputting customized scientific publications in EPUB3. By making
use of EPUB3 to create an integrated and customizable representation
of a scientific publication and its associated resources, we believe that
we are able to augment the reading experience of scholarly publications,
and thus the effectiveness of scientific communication.
1</p>
      <p>Introduction
Scientific publications consist of more than only text: they may also point to
many associated (binary) resources, including videos, prototypes, slides, and
datasets. Yet today, only the access to the text of a scientific publication is
straightforward; the associated resources are often more difficult to access. For
instance, readers may not always have an Internet connection at their disposal to
download related materials, and even when this is the case, links might become
broken after a while. Furthermore, given their diverse nature, related materials
often need to be accessed in a different reading environment like a standalone
media player, causing readers to lose track of the scientific narrative.</p>
      <p>The 2007 Brussels Declaration3 by the International Association of Scientific,
Technical and Medical (STM) Publishers states that “raw research data should
be made freely available” and that “one size fits all solutions will not work”.
In this paper, we illustrate that the ability to (adaptively) create an integrated
representation of a scientific publication and its associated resources contributes
to these goals. Specifically, we evaluate the most common scientific publication
formats in terms of their capability to implement the desirable attributes of an
enhanced publication and to meet the functional goals of an enhanced
publication information system: PDF, HTML, EPUB2, and EPUB3. In addition, we
present an EPUB3 version of an exemplary publication in the field of computer
science, integrating and interlinking an explanatory video and an interactive
prototype. Finally, we introduce a demonstrator that is capable of outputting
customized scientific publications in EPUB3.</p>
      <p>The rest of this paper is structured as follows. In Section 2, we discuss a
number of current best practices among three scientific publishers, focusing on the
way open formats and their features are used to enhance scientific publications.
Next, in Section 3, we investigate to what extent PDF, HTML, EPUB2, and
EPUB3 facilitate the use of enhanced scientific publications and corresponding
information systems. In Section 4, we present an exemplary scientific publication
in EPUB3 that integrates an explanatory video and an interactive prototype.
In Section 5, we introduce our demonstrator for creating customized scientific
publications in EPUB3. Finally, in Section 6, we present our conclusions and a
number of directions for future work.
2</p>
      <p>Current Best Practices
In this section, we briefly discuss a number of current best practices among three
scientific publishers, focusing on the way open formats are used to make available
scientific publications that have been enhanced with multimedia, interactivity,
and/or Semantic Web features.</p>
      <sec id="sec-5-1">
        <title>BioMed Central and Hindawi Publishing Corporation: These pub</title>
        <p>lishers make scientific publications available in several formats: PDF, HTML, and
EPUB2. The HTML version of the publications can for instance be enhanced
with reusable data (e.g., supplementary datasets), while the EPUB2 version of
the publications just uses links to cited publications in EPUB2 format. However,
the publications in question do not contain any embedded interactive multimedia
content.</p>
        <p>
          Elsevier: Elsevier makes available different versions of a scientific
publication: PDF, HTML, MOBI, and EPUB2. In addition, authors are able to
deposit their datasets, making it possible for readers to access and download these
datasets [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Moreover, the EPUB2 version of a publication is enriched with direct
links to the PDF version of cited publications, thus not embedding these PDF
versions into the EPUB2 file. Furthermore, the EPUB2 version of a publication
does not contain any embedded interactive multimedia content.
3 http://www.stm-assoc.org/brussels-declaration/
        </p>
        <p>In summary, we can conclude that none of the aforementioned EPUB2
versions – as currently made available by BioMed Central, Hindawi Publishing
Corporation, and Elsevier – embed interactive multimedia content for offline
usage (i.e., readers need to have network connectivity in order to be able to access
all linked resources), nor do they contain Semantic Web features.
3</p>
        <p>
          Comparative Analysis of Publication Formats
In recent years, a new open format for distribution and interchange of digital
publications has emerged, called EPUB3 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. This format can also be used in
the context of scientific publications. In what follows, we investigate to what
extent PDF, HTML, EPUB2, and EPUB3 are able to support the properties of
an enhanced scientific publication (that is, a scientific publication with
multimedia, interactivity, and/or Semantic Web features). To that end, we analyzed a
number of desirable attributes of an enhanced publication. Furthermore, we also
investigated the functional goals of an enhanced publication information system
(that is, the system that facilitates the authoring of enhanced publications).
        </p>
        <p>
          Thoma et al. [10] defined a core set of nine desirable attributes of an enhanced
publication: appearance, page transitions, in-page navigation, image browsing,
navigation to an embedded/linked media object, support for interactivity,
transmission, embedding and linking of multimedia/interactive objects, and document
integrity and structure. In addition, by both considering the attributes defined by
Thoma et al. in [10] and a review of five already existing enhanced publications,
Adriaansen et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] identified eleven attributes of an enhanced publication:
navigation by table of contents, metadata, links to figures and tables, attached data
resources, link from text to references, direct publication links from references,
reader comments, download as PDF, interactive content, relations, and cited by.
Furthermore, as argued in a talk by Ivan Herman4, bridging online and offline
access is a need for high-quality digital books, and consequently for high-quality
digital scientific publications, given that offline access enables users to access
supplementary information, even when they do not have a network connection
at their disposal. As a result, although none of the aforementioned research
efforts discusses this aspect, we consider offline access to be a desirable attribute
of an enhanced publication as well.
        </p>
        <p>
          Besides the attributes of enhanced publications, we also considered data
model and information system aspects. Bardi et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] reviewed existing data
models for enhanced publications, taking into account structural and semantic
features, also proposing a classification scheme for enhanced publication
information systems based on their main functional goals. In this context, the authors
outline four major scientific motivations that explain the functional goals of an
enhanced publication information system: packaging with supplementary
material, improving readability and understanding, interlinking with research data,
and enabling repetition of experiments. Furthermore, we believe that portability
4 http://www.w3.org/2014/Talks/0411-Seoul-IH/Talk.pdf
is also needed in order to preserve the availability of resources and their
interlinking, given that it enables users to even access supplementary information
in offline situations. Thus, an enhanced publication that has supplementary
resources needs to be a self-contained package. Therefore, we identified portable
packaged file as another desirable attribute of an enhanced publication.
        </p>
        <p>Finally, according to Liu [8], users are in need of a hybrid solution for print
and digital resources. This means that, besides all different digital publication
formats, print also remains an important publication medium. As a result, we
see suitable for print as another desirable attribute of an enhanced publication.</p>
        <p>
          Ideally, an enhanced publication information system should be able to
support all the desirable attributes mentioned above. Considering the desirable
attributes of enhanced publications and the functional goals of enhanced
publication information systems, we mapped the attributes identified in [
          <xref ref-type="bibr" rid="ref2">10,2</xref>
          ] onto each
functional goal identified by Bardi et al. in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Our mapping can be found in
the first and second column of Table 1. We can observe that nearly all desirable
attributes of an enhanced publication can be covered by the functional goals
of an enhanced publication information system, with the exception of the final
three attributes, for which we defined our own functional goals.
        </p>
        <p>Next, we investigated what scientific publication formats are the most
promising to cover both the desirable attributes of an enhanced publication and the
functional goals of an enhanced publication information system. We have
summarized our findings in the four rightmost columns of Table 1. Corresponding
explanatory notes can be found below.</p>
        <p>Packaging with supplementary material: This functional goal states that
it should be possible to add supplementary material to a scientific
publication. PDF can embed audio and video but it does not support rich media
(e.g., media overlays). As such, it is not a suitable format for embedding
various types of associated resources (e.g., interactive content and standalone
applications). Consequently, PDF has limited support for this functional goal
and its underlying attributes. Note that extensions exist, such as export to a
PDF Portfolio in Adobe Acrobat5, that make it possible to combine related
materials. However, to the best of our knowledge, none of these extensions for
instance allow embedding interactive content and standalone applications.
Furthermore, the embedded resources are not reusable, unlike the EPUB3
format, which lets users reuse embedded resources. In order to package
research data within an HTML file, all the dependencies need to be packaged
as well. While this is possible (e.g., using a zipped folder), there is no
standardized approach to do this, as opposed to EPUB2 and EPUB3. Therefore,
we do not consider HTML to be suitable for meeting this functional goal.
According to the EPUB2 specification [7], EPUB2 cannot embed
multimedia and interactive objects. Consequently, EPUB2 also offers limited support
for this functional goal. However, in EPUB3, no such restrictions are
specified. As a result, we can conclude that EPUB3 is the only format that fully
supports this functional goal.
5 http://www.adobe.com/products/acrobat/combine-pdf-files-portfolio.html</p>
        <p>Format
PDF HTML EPUB2 EPUB3</p>
        <p>D
D
D
D</p>
        <p>D
D*</p>
        <p>D*</p>
        <p>D
D
D</p>
        <p>D*
D*</p>
        <p>Attributes
– Embedding and linking of
mul</p>
        <p>timedia/interactive objects
pPlaecmkaegnitnagry wmiathtersiaulp- –– tADutortceaucmheedntdainttaegrersitoyuracneds
struc– Navigating to an embedded /</p>
        <p>linked media object
Enabling repetition of – Native support for interactivity
experiments – Code execution
– Interactive content
– Navigation by table of contents
– Reader comments
– Appearance
– Page transitions
Improving readability – In-page navigation
and understanding – Image browsing
– Links to figures and tables
– Direct publication links from</p>
        <p>references
– Cited by
Interlinking with re- – Metadata
search data – Relations
Portable packaged file – Bridging online / offline</p>
        <p>– Transmission
Suitable for print
– Download as PDF</p>
        <p>D*
D*
D</p>
      </sec>
      <sec id="sec-5-2">
        <title>Enabling repetition of experiments: This functional goal aims at enabling</title>
        <p>researchers to (re-)execute experiments and/or demonstrators from within a
scientific publication. PDF has limited support for scripting and code
execution. However, the support available is not sufficient for building small
standalone applications that can act as interactive content (e.g., self-contained
widgets). As a result, PDF is not suitable for meeting this functional goal.
HTML is able to embed code (e.g., JavaScript). Moreover, thanks to the
inline frame element (that is, the iframe element), HTML can also be used as
an interface to other experiments. As EPUB2 does not support JavaScript,
it is not suited for repetition of experiments. However, similar to HTML,
EPUB3 supports JavaScript, and thus the aforementioned functional goal
(unless experiments are involved that for instance use complex algorithms
on clusters to obtain their results).</p>
      </sec>
      <sec id="sec-5-3">
        <title>Improving readability and understanding: PDF is a specific format for</title>
        <p>
          print, and not for screen readers. While still undeniably the most suitable
format for print layout, in digital form, it does not have device
independence [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], making it difficult to maintain readability on different screens.
According to the PDF specification, it has a limited support for this
functional goal. On the other hand, HTML, EPUB2, and EPUB3 are suitable
for improving readability and understanding, because they can overcome the
aforementioned shortcomings of PDF (cf. the use of reflowable layout).
Interlinking with research data: In order to make links between
supplementary materials added to publications, (relational) metadata need to be taken
into account. PDF has a coarse level of support for metadata (e.g., title and
author information), and where these metadata are not related to
interlinking supplementary materials. As a result, PDF is not suitable for meeting
this functional goal. HTML can be enriched for interlinking purposes using
Semantic Web formats and technologies [9] (e.g., RDF and OWL). EPUB2
has limited support for metadata. Furthermore, it does not allow
embedding multimedia and interactive content as supplementary research data.
Hence, EPUB2 is not suitable for meeting this functional goal. According to
the EPUB3 specification, it supports metadata and interlinking of research
data. In fact, it retains all functionality of (X)HTML5.
        </p>
        <p>Apart from a suitable format, interlinking supplementary materials requires
suitable ontologies. Fortunately, many suitable candidates for general and
specific interlinking purposes are already available. For example, schema.org
is an ontology that is suitable for use in a variety of domains, including the
description of events and creative works. It can thus be used to semantically
enhance publications, and it can also be extended by other ontologies.
Furthermore, Standard Analytics6 aims at turning scholarly publications into
an interface to a web of data, making use of already existing web ontologies.
Moreover, Structural, Descriptive, and Referential (SDR)7 is an ontology for
representing academic publications, related artifacts (e.g., videos, slides, and
datasets), and referential metadata. This ontology can generically define all
possible interactive and multimedia resources. In addition, any publication
can use general ontologies such as the Citation Typing Ontology (CiTO)8,
the Bibliographic Ontology (BIBO)9, and the Common European Research
Information Format (CERIF)10. Finally, publications may also need to make
use of ontologies that are specific for their research domains (e.g., in the
medical domain, the Infectious Disease Ontology (IDO)11 could be used).
6 https://standardanalytics.io/
7 http://onlinelibrary.wiley.com/doi/10.1002/asi.23007/full
8 http://www.essepuntato.it/lode/http://purl.org/spar/cito
9 http://bibliontology.com/
10 http://helios-eie.ekt.gr/EIE/bitstream/10442/13864/1/IJMSO_2014_CERIF_
authorFinalVersion.pdf
11 http://infectiousdiseaseontology.org/page/Main_Page
Portable packaged file: PDF has limited support for packaging interactive
content and standalone applications. Furthermore, it cannot bridge the gap
between online and offline usage. Indeed, PDF is an offline format for print,
and any interactive parts will not remain after printing a publication. As
mentioned before, HTML lacks a proper packaging structure, making this
format not a suitable candidate for meeting this functional goal. A similar
remark holds regarding EPUB2, as this format does not have support for
embedding interactive multimedia resources. As EPUB3 has extensive
support for embedding interactive multimedia resources, it can be considered a
suitable format for creating portable packaged files. Ideally, users expect that
all types of resources can be embedded in a packaged file, regardless of their
size. This is one of the shortcomings of EPUB3. Embedding large datasets
makes the size of an EPUB3 file potentially very large, causing portability
and readability issues. We discuss a possible solution to this issue in
Section 5.</p>
        <p>Suitable for print: Currently, PDF is the only format suitable for print.
Although HTML, EPUB2, and EPUB3 can also be used for the purpose of
print, they have been designed for screen readers and can currently not
match the high typesetting demands for print publications.</p>
        <p>As can be seen in Table 1, EPUB3 is the format that supports most desirable
attributes of an enhanced publication and most functional goals of an enhanced
publication information system. Only PDF is suitable for print output, given that
HTML and EPUB(2/3) have been primarily designed for screen output, typically
resulting in a layout that is suboptimal for print. Note that, as a workaround for
this problem, the EPUB(2/3) and HTML versions of a publication can embed
or link to the PDF version of a publication.
4</p>
        <p>Proof-of-Concept: A Scientific Publication in EPUB3
In this section, we demonstrate how EPUB3 can be used to create an integrated
representation of a scientific publication and its associated resources. To that
end, we enhanced the “Everything is Connected” publication [11] – a paper
authored by ourselves and a number of colleagues – embedding an explanatory
video and an interactive prototype. The resulting proof-of-concept is available for
download12. We used Readium13 as our electronic reading system, since it
supports most features of EPUB3. As illustrated by Figure 1, our proof-of-concept
shows how a publication can act as an interface to different types of research
outputs. Note that, instead of adding a link to the online version of the
interactive prototype, we made use of an iframe to allow immediate access to the
interactive prototype from within the publication, thus not requiring the reader
to make use of a different reading environment.
12 http://multimedialab.elis.ugent.be/users/hghaemsi/EnhancedPublication.</p>
        <p>epub
13 http://readium.org/</p>
        <p>Furthermore, we semantically enhanced our exemplary EPUB3 publication
by making use of schema.org, a general ontology that allows describing books
and articles, among other creative works. Thanks to properties such as embedUrl,
description, and contentUrl, schema.org makes it possible to indicate how
a resource is related to the target EPUB3 publication in a straightforward way.
We illustrate this in Figure 2. Note that schema.org is supported by major
search engines such as Bing, Google, Yahoo!, and Yandex. However, at the time
of writing this paper, the aforementioned search engines did not have support
yet for indexing EPUB3 publications (and reading the metadata available within
these publications).</p>
        <p>Creating Customized EPUB3 Publications
In the previous sections, we explained how supplementary materials can be
embedded into a scientific publication. As mentioned before, embedding all relevant
supplementary materials in a portable packaged file is not always cost-effective
and/or desirable for a user. Since the size of an EPUB3 file is dependent on the
size of all embedded resources, it will not be lightweight in all use cases, e.g.,
when embedding large datasets. The problem is that, on the one hand, a
packaged file should not face portability and other usage issues relevant to its size.
On the other hand, the advantages of having a portable packaged publication
are overthrown with the disadvantage of not being able to distribute the entire
publication properly. Users may not need all embedded supplementary
materials and instead, wish to have their own customized lightweight publication. For
instance, we can refer to big datasets or high-resolution images which can be
located in a remote repository instead of embedding them in the portable
packaged file. An environment for outputting customized publications allows users
to select and embed the supplementary materials to the extent that they choose.
Hence, they can determine the size of the EPUB3 file themselves. That way, the
problem of distributing overly large publications is solved, and only the content
that the user needs is distributed. The only disadvantage of this approach is
the added complexity at the distribution side (i.e., at the platform of the
publisher). However, most publishers already have an extensive online distribution
infrastructure, which could easily be expanded with an interface such as the one
we propose. For example, publishers such as Elsevier offer different formats of
a publication to users. In particular, on the ScienceDirect website of Elsevier,
there is an option for the user to select his/her preferred format.</p>
        <p>To illustrate this concept of customizable publications, we implemented a
basic demonstrator in which a user can first select the relevant supplementary
material using a web interface, after which a customized EPUB3 publication is
outputted. Figure 3 shows the user interface of our online demonstrator. Content
selection is entirely done at the client side, based on the HTML representation
of a publication. The selected content is then packaged as an EPUB3 file on
the server side. The resulting demonstrator is available online14. Note that the
author of a publication can determine which elements are customizable, simply
by adding the class customizable to the desired HTML elements.</p>
        <p>
          Ideally, the implemented functionality for outputting customized publications
in EPUB3 would be integrated into an authoring environment, where authors
and publishers could indicate which elements of a publication are customizable.
In previous work, we have implemented such an authoring environment for the
collaborative creation of enriched e-Books using EPUB3 [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. It allows authors
and publishers to create an electronic publication with all required material
embedded. Next, this publication can be exported as an EPUB3 file. In future
work, we aim to showcase an integrated version of this authoring environment
with a customizable distribution platform as described above.
In this paper, we demonstrated that the increasingly popular EPUB3 format
can be used to create integrated representations of a scientific publication and
its associated resources. By doing so, we believe that this contributes to a
better reading experience and more effective scientific communication (e.g., support
for the inclusion of explanatory videos and interactive prototypes should enable
authors to better transfer their knowledge and experience). In addition, we
indicated that an EPUB3 version of a scientific publication can be used as a primary
version, from which other versions of the scientific publication can be reached
(e.g., a PDF version for print), thereby allowing legacy content to persist.
        </p>
        <p>We can identify a number of directions for future research. First, user-friendly
authoring tools are needed that allow easily creating enhanced scientific
publications, and where these scientific publications can act as an interface to different
research outputs. We have already started taking steps in this direction.
Second, these authoring tools need to support different output formats, in order
to meet the needs of both readers that are reading on paper and readers that
are reading digitally. Third, these authoring tools also need to make it possible
to easily add metadata to EPUB3 versions of scientific publications, such that
EPUB3 versions of scientific papers may have the same degree of discoverability
as PDF and HTML versions. Finally, it would be interesting to investigate the
good practices of novel publication repositories such as PLOS ONE, Figshare,
and ResearchGate.
The research activities described in this paper were funded by Ghent University,
iMinds (a research institute founded by the Flemish Government), the Institute
for Promotion of Innovation by Science and Technology in Flanders (IWT), the
FWO-Flanders, and the European Union.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Semantic lenses to bring digital and semantic publishing together</title>
      <p>Angelo Di Iorio1, Silvio Peroni1,2, Fabio Vitali1, and Jacopo Zingoni1
Abstract. Modern scholarly publishers are making steps towards
semantic publishing, i.e. the use of Web and Semantic Web technologies to
represent formally the meaning of a published document by specifying
information about it as metadata and to publish them as Open Linked
Data. In this paper we introduced a way to use a particular
semantic publishing model, called semantic lenses, to semantically enhance a
published journal article. In addition, we present the main features of
TAL, a prototypical application that enables the navigation and
understanding of a scholarly document through these semantic lenses, and we
describe the outcomes of a user testing session that demonstrates the
efficacy of TAL when addressing tasks requiring deeper understanding
and fact-finding on the content of the document.
1</p>
      <p>Introduction
Simultaneously to the evolution of the Web by means of Semantic Web
technologies, modern publishers (and in particular scholarly publishers) are making
steps towards the enhancing of digital publications with semantics, an approach
that is known as semantic publishing [22]. In brief, semantic publishing is the
use of Web and Semantic Web technologies to represent formally the meaning
of a published document by specifying a large quantity of information about
it as metadata and to publish them as Open Linked Data. As a confirmation
of this trend, recently the Nature Publishing Group (publisher of Nature), the
American Association for the Advancement of Science (publisher of Science) and
the Oxford University Press have all announced initiatives to open their articles’
reference lists and to publish them as Open Linked Data3,4,5.
3 Nature.com Linked Data: http://data.nature.com.
4
http://opencitations.wordpress.com/2012/06/16/science-joins-nature-in-openingreference-citations
5
http://opencitations.wordpress.com/2012/06/22/oxford-university-press-tosupport-open-citations</p>
      <p>However, the enhancement of a traditional scientific paper with semantic
annotations is not a straightforward operation, since it involves much more than
simply making semantically precise statements about named entities within the
text. In [17], we have shown how several relevant points of view exist beyond
the bare words of a scientific paper – such as the context of the publication, its
structural components, its rhetorical structures (e.g. Introduction, Results,
Discussion), or the network of citations that connects the publication to its wider
context of scholarly works. These points of view are usually combined together
to create an effective unit of scholarly communication so well integrated into the
paper as a whole and into the rhetorical flow of the natural language of the text,
so as to be scarcely discernible as separate entities by the reader. We also
propose the separation of these aspects into eight different sets of machine-readable
semantic assertions (called semantic lenses), where each set describes one of
(from the most contextual to the most document-specific): research context,
authors’ contributions and roles, publication context, document structure, rhetoric
organisation of discourse, citation network, argumentative characterisation of
text, and textual semantics.</p>
      <p>How can the theory of semantic lenses be used to extend effectively
semantic publishing capabilities of publishers? In order to provide an answer to this
question, in this paper we introduce a prototypical HTML interface to scholarly
papers called TAL (Through A Lens), which enables the navigation of a text
document on which semantic lenses have been applied to make explicit all the
corresponding information. This HTML interface is meant to be a proof of
concept of the semantic lenses in a real-case scenario. We performed a user testing
session that demonstrates the efficacy of TAL when addressing tasks requiring
deeper understanding and fact-finding on the content of the document.</p>
      <p>The rest of the paper is organised as follows. In Section 2 we introduce
some significant works related to semantic publishing experiences and models. In
Section 3 we show an application of semantic lenses onto a particular scholarly
article. In Section 4 we introduce TAL describing its main features, while in
Section 5 we discuss the outcomes of a user testing session we performed to
assess the usability and effectiveness of TAL. Finally (Section 6) we conclude
the paper sketching out some future works.
2</p>
      <p>Related works
Much current literature concerns both the proofs of concepts for semantic
publishing applications and the models for the description of digital publishing from
different perspective. Because of this richness, here we present just some of the
most important and significant works on these topics.</p>
      <p>In [22], Shotton et al. describe their experience in enriching and providing
appropriate Web interfaces for scholarly papers enhanced with provenance
informations, scientific data, bibliographic references, interactive maps and tables, with
the intention to highlights the advantages of semantic publishing to a broader
audience. Along the same lines, in their work [19] Pettifer et al. introduce pros
and cons of the various formats for the publication of scholarly articles and
propose an application for the semantic enhancement of PDF documents according
to established ontologies.</p>
      <p>
        A number of vocabularies for the description of research projects and related
entities have been developed, e.g. the VIVO Ontology6 – developed for describing
the social networks of academics, their research and teaching activities, their
expertise, and their relationships to information resources –, the Description
Of A Project7 – an ontology with multi-lingual definitions that contains terms
specific for software development projects – and the Research Object suite of
ontologies [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] – for linking together scientific workflows, the provenance of their
executions, interconnections between workflows and related resources (datasets,
publications, etc.), and social aspects related to such scientific experiments.
      </p>
      <p>
        One of the most widely used ontology for describing bibliographic entities
and their aggregations is BIBO, the Bibliographic Ontology [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. FRBR,
Functional Requirements for Bibliographic Records [10], is yet another more
structured model for describing documents and their evolution in time. One of the
most important aspects of FRBR is the fact that it is not tied to a particular
metadata schema or implementation.
      </p>
      <p>Several works have been proposed in the past to model the rhetoric and
argumentation of papers. For instance, the SALT application [9] permits someone
such as the author “to enrich the document with formal descriptions of claims,
supports and rhetorical relation as part of their writing process”. There are
other works, based on [23], that offer an application of Toulmin’s model within
specific scholarly domains, for instance the legal and legislative domain [11].
A good review of all the others Semantic Web models for the description of
arguments can be found in [21].
3</p>
      <p>The Semantic Lenses
In [17], we claimed that the semantics of a document is definable from different
perspectives, where each perspective is represented as a semantic lens that is
applied to a document to reveal a particular semantic facet. In this section we
briefly summarise our theory. A full example of the lenses applied to a well-known
paper Ontologies are us: A unified model of social networks and semantics [14]
is available at http://www.essepuntato.it/lisc2014/lens-example.</p>
      <p>Lenses are formalised in the LAO ontology8. In addition, since the application
of the semantic lenses to a document is an authorial activity, i.e. the action of
a person (the original author as well as anyone else) taking responsibility for a
semantic interpretation of the document, we also record the provenance of the
semantic statements according to the Provenance Ontology (PROV-O) [12].</p>
      <p>Figure 1 summarises the overall conceptual framework. The lenses are
organised in two groups: context -related, which describe the elements contributing to
6 VIVO Ontology: http://vivoweb.org/ontology/core
7 DOAP: http://usefulinc.com/ns/doap
8 Lens Application Ontology (LAO): http://www.essepuntato.it/2011/03/lens.
the creation and development of a paper, and content -related, which describe
the content itself of the paper from different angles.
Writing a scientific paper is usually the final stage of an often complex
collaborative and multi-domain activity of undertaking the research investigation from
which the paper arises. The organizations involved, the people affiliated to these
organizations and their roles and contributions, the grants provided by funding
agencies, the research projects funded by such grants, the social context in which
a scientific paper is written, the venue within which a paper appears: all these
provide the research context that leads, directly or indirectly, to the genesis of
the paper, and awareness of these may have a strong impact on the credibility
and authoritativeness of its scientific content.</p>
      <p>Three lenses are designed to cover these aspects:
– Research context: the background from which the paper emerged (the
research described, the institutions involved, the sources of funding, etc.). To
describe such contextual environment we use FRAPO, the Funding, Research
Administration and Projects Ontology9.
– Contributions and roles: the individuals claiming authorship on the paper
and what specific contributions each made. We use SCoRO (the Scholarly
Contributions and Roles Ontology10) and its imported ontology PRO (the
Publishing Roles Ontology11) [18] to describe these aspects.
9 FRAPO: http://purl.org/cerif/frapo
10 SCoRO: http://purl.org/spar/scoro
11 PRO: http://purl.org/spar/pro
LinkedPPI: Protein-Protein Interaction Discovery
rate-limiting step to our data-warehousing approach for centralised analysis. We
have proposed a domain-specific model which can accommodate the needs in the
field of PPI modelling. The use of a domain-specific model and an interactive
graph-based exploration platform for search and aggregative visualisation makes
our integration approach more intuitive for the actual users who deal with PPI
predictions. We have also proposed a set of three user scenarios depicting how
LinkedPPI framework could be used for the prediction of potential interactions
between proteins, domains and genomic regions.
6</p>
      <p>Future Work
The approach which has been presented in this work is used in extraction of
valuable information with regard to PPI network, domain-domain interactions
and selective genomic interactions. However the observations reported in the
outcome of such data retrieval is raw and could be a valuable asset for simulations
and prediction methods if further analysis is done. As part of the future work we
intend to apply statistical analysis on significance of such observations in order
to be able to develop a classifier algorithm which is able to predict interacting
and non-interacting protein pairs.</p>
      <p>Acknowledgements This work has been done under the Simulation Science
program at the National University of Ireland, Galway. SimSci is funded by
the Higher Education Authority under the program for Research in Third-level
Institutions and co-funded under the European Regional Development fund.</p>
    </sec>
    <sec id="sec-7">
      <title>Using the Micropublications ontology and the</title>
    </sec>
    <sec id="sec-8">
      <title>Open Annotation Data Model to represent evidence within a drug-drug interaction knowledge base</title>
      <p>Jodi Schneider1, Paolo Ciccarese2, Tim Clark2, and Richard D. Boyce3
1 INRIA Sophia Antipolis France</p>
      <p>jodi.schneider@inria.fr
2 Massachusetts General Hospital and Harvard Medical School
paolo.ciccarese@gmail.com; tim clark@harvard.edu
3 University of Pittsburgh</p>
      <p>rdb20@pitt.edu
Abstract. Semantic web technologies can support the rapid and
transparent validation of scientific claims by interconnecting the assumptions
and evidence used to support or challenge assertions. One important
application domain is medication safety, where more efficient acquisition,
representation, and synthesis of evidence about potential drug-drug
interactions is needed. Potential drug-drug interactions (PDDIs), defined
as two or more drugs for which an interaction is known to be possible,
are a significant source of preventable drug-related harm. The
combination of poor quality evidence on PDDIs, and a general lack of PDDI
knowledge by prescribers, results in many thousands of preventable
medication errors each year. While many sources of PDDI evidence exist to
help improve prescriber knowledge, they are not concordant in their
coverage, accuracy, and agreement. The goal of this project is to research
and develop core components of a new model that supports more
efficient acquisition, representation, and synthesis of evidence about
potential drug-drug interactions. Two Semantic Web models—the
Micropublications Ontology and the Open Annotation Data Model—have great
potential to provide linkages from PDDI assertions to their supporting
evidence: statements in source documents that mention data, materials,
and methods. In this paper, we describe the context and goals of our
work, propose competency questions for a dynamic PDDI evidence base,
outline our new knowledge representation model for PDDIs, and discuss
the challenges and potential of our approach.
1</p>
      <p>Introduction
Scientific knowledge depends on the verification and integration of large systems
of interconnected assertions, assumptions, and evidence. These systems are
continually growing and changing, as new scientific studies are completed and new
documents are published. The state of current knowledge in any given domain
can be difficult for any one individual to fully grasp, because bits of knowledge
are updated at frequent intervals.</p>
      <p>In the biosciences, this problem has taken on particular importance, due
to an exponential growth in the aggregate publication rate. Manually curated
databases are used to record certain types of knowledge. To update and maintain
these databases, curators must make knowledge-intensive decisions, identifying
the best available evidence in the current scientific literature. Maintaining such
databases is challenging because there is limited tracking of the source
information.</p>
      <p>In an ongoing project, we are experimenting with using the Micropublications
Ontology4 [Clark2014] and the Open Annotation Data Model5 [W3C2013] to
create an audit trail between assertions, evidence, and source documents, so
that assertions and evidence can be flagged for update in flexible and intelligent
ways. Updates may be needed when the underlying sources change, when a
particular method for establishing an assertion is discredited, etc. Our goal is
to provide better linkages between an assertion recorded in a knowledge base
and its supporting evidence (i.e., data, materials, and methods) found in source
documents.</p>
      <p>In the remainder of the paper, we describe the competency questions for
our evidence base and the new evidence model that we are creating, which
combines the Micropublication Ontology and the Open Annotation Data Model,
and adapts them to the existing evidence modeling of the Drug Interaction
Knowledge Base6 [Boyce2007,Boyce2009]. We then reflect on how the new model
performs for our goal of creating an audit trail between assertions, evidence, and
source documents.
2</p>
      <p>Context and goals
Our work is in the context of a larger project on organizing and synthesizing
scientific evidence from the biomedical literature on potential drug-drug
interactions. Potential drug-drug interactions (PDDIs), defined as two or more drugs
for which an interaction is known to be possible, are a significant source of
preventable drug-related harm (i.e., adverse drug events, or ADEs). The
combination of poor quality evidence on PDDIs, and a general lack of PDDI
knowledge by prescribers, results in many thousands of preventable medication
errors each year. While many sources of PDDI evidence exist to help improve
prescriber knowledge, they are not concordant in their coverage [Saverno2011],
accuracy [Wang2010], and agreement [Abarca2003]. Difficulties with
synthesizing evidence, and gaps in the scientific knowledge of PDDI clinical relevance,
underlie such disagreement.
4 http://purl.org/mp/
5 http://www.openannotation.org/spec/core/
6 http://purl.net/net/drug-interaction-knowledge-base/</p>
      <p>To address these problems, our research group is studying the potential
benefit of applying recent developments from the Semantic Web community on
scientific discourse modeling and open annotation. The goal is to develop core
components of a new PDDI knowledge representation model that will support a more
efficient acquisition, representation, and synthesis of PDDI evidence. The desired
knowledge representation will provide better linkages between PDDI assertions
and their supporting evidence, by directly connecting to annotated section(s) of
relevant source documents.
3</p>
      <p>Approach
Our new approach will draw upon the current version (1.2) of the Drug
Interaction Knowledge Base [Boyce2007,Boyce2009], the Open Annotation Data
Model [W3C2013], and the Micropublications Ontology [Clark2014].</p>
      <p>The Drug Interaction Knowledge Base (DIKB) is a static, manually
constructed evidence base that indexes assertions and evidence of PDDI for over 60
drugs. Its taxonomy of assertion types and evidence types [Boyce2014] is a
starting point for the new knowledge base. The current version of the DIKB
implements a version of the SWAN semantic discourse ontology [Ciccarese2008] to
represent evidence relations. Specifically, the knowledge base uses
swanco:citesAsSupportingEvidence and swanco:citesAsRefutingEvidence to link to an entire
source document as a supporting or refuting citation. At the time the DIKB
1.2 was constructed (2007–2009), annotation methodologies were less well
developed. Consequently, version 1.2 of the DIKB stores quotes as textual strings
manually copied from source documents. The text has been enriched with
metadata about the source section, but it is non-trivial to return to the appropriate
segment of the text from this information.</p>
      <p>Our use of the Open Annotation Data Model (OA) reflects a change in the
state of the art. OA is an “an interoperable framework for creating associations
between related resources, annotations, using a methodology that conforms to
the Architecture of the World Wide Web”7. In particular, OA allows an evidence
database to provide explicit connections from quotes to their source documents.
For example, as shown in Figure 1, an OA resource can be used to quote a specific
part of a drug product label (also known as a summary of product characteristics)
to indicate evidence that escitalopram inhibits CYP2D6. In general, OA enables
queryable links between selections from source documents (as target) to the
instances of data, methods, and materials (as body) that we want to model to
support drug interaction knowledge base use cases.</p>
      <p>Similarly, the Micropublications Ontology improves the depth with which
evidence can be represented and queried. The most important feature of the
Micropublications model, in our view, is its ability to represent the data, methods,
and materials that act as support for a claim, and to transitively close chains
of claims8 and citations across the literature to their fundamental supporting
evidence. A mp:Micropublication mp:argues a mp:Claim based on connecting
any number of mp:Representations. The whole Micropublication is a
Representation, as are Data and Methods (including Materials and Procedures), whether
textual or pictoral. A mp:Representation may mp:support or mp:challenge any
other mp:Representation, making the evidence explicit and queryable.
4
To design an appropriate enhancement of the DIKB model with
Micropublications and the Annotation Ontology, we need to understand what sorts of
questions experts would like to retrieve about the PDDIs. The competency questions
below were elicited from experienced editors of clinically oriented drug
compendia during the process of developing DIKB 1.2. Most fall into three categories:
finding assertions and evidence; assessing the evidence; and enabling updates. A
second area of interest is statistical information about the evidence base which
is useful for various analytics related to knowledge base maintainance.
4.1</p>
      <p>Finding assertions and evidence
1. Understanding evidence coming from a given study:
(a) What data, methods, materials, are reported in evidence item X?
(b) Which evidence items are related to and follow-up on evidence item X?
(c) Which research group conducted the study used for evidence item X?
(d) Are the evidence use assumptions for evidence item X concordant? unique?
non-ambiguous?
8 ‘Assertion’ in DIKB terminology corresponds to a ‘Claim’ in the Micropublications
model; this variation in terms is because the term ‘claim’ is used in a different sense
in medical billing.
2. Verifying plausibility of an evidence item:
(a) Has evidence item X been rejected for assertion Y? If so, why and by
whom?
(b) Which other assertions are being supported/challenged by this evidence
item?
(c) What are the assumptions required for use of this evidence item to
support/refute assertion X?
3. Checking assertions about pharmacokinetic parameters (i.e., area
under the concentration time curve (AUC))
(a) How many pharmacokinetic studies used for evidence items in the DIKB
could be used to support or refute an assertion about pharmacokinetic
paramater X (e.g., ‘X increases AUC’)?
(b) How many pharmacokinetic studies in the DIKB used for evidence items
for assertion X are based on data from the product label?
(c) What is the result of averaging (or applying some other statistical
operation) to the values for pharmacokinetic parameter X across all relevant
studies used for evidence items?
4. Checking for differences in the product labeling:
(a) Are there differences in the evidence items that were identified across
different versions of product labeling for the same drug?
(b) What version of product labeling was used for evidence item X? Original
manufacturer or repackager? Most current label or outdated? Is the drug
on market in country X or not? American or country X?</p>
      <p>Supporting updates to evidence and assertions
1. Changing status of redundant and refuted evidence:
(a) Remove a older version of a redundant evidence item
(b) Change the modality of a supporting evidence item to be a refuting
evidence item</p>
      <sec id="sec-8-1">
        <title>2. Updating when key sources change:</title>
        <p>(a) Get all assertions that are supported by evidence items identified from
an FDA guidance or other source document just released as an updated
version.
4.4</p>
        <p>Understanding the evidence base
1. Statistical information about the evidence base:
(a) Number of assertions in the system
(b) Number of evidence items for and against each assertion type
(c) Show the distribution of the levels of evidence for various assertion types
(e.g., pharmacokinetic assertions)
5 Modeling evidence about drug-drug interactions
would represent some of the evidence supporting and challenging the assertion
escitalopram does not inhibit CYP2D6. We created the example by hand using
a sample assertion and evidence items from the DIKB version 1.29.
hasAttribution</p>
        <p>RDB May 14</p>
        <p>MICROPUBLICATION
represents
Escitalopram does not inhibit CYP2D6</p>
        <p>qualifies
In vitro studies did not reveal an inhibitory effect of
escitalopram on CYP2D6.</p>
        <p>Steady state levels of racemic citalopram were not
significantly different in poor metabolizers and extensive
CYP2D6 metabolizers after multiple-dose administration
of citalopram, suggesting that coadministration, with
escitalopram, of a drug that inhibits CYP2D6, is unlikely
to have clinically significant effects on escitalopram
metabolism.</p>
        <p>There are limited in vivo data suggesting a modest
CYP2D6 inhibitory effect for escitalopram.</p>
        <p>Coadministration of escitalopram, a substrate for
CYP2D6, resulted in a 40% increase in Cmax and a
100% increase inAUC of desipramine.</p>
        <p>Coadministration of escitalopram (20 mg/day for 21
days) with the tricyclic antidepressant desipramine
(single dose of 50 mg),
LEXAPRO (escitalopram oxalate) tablet. Forest
Labs. 12/2012</p>
        <p>Ref 1</p>
        <p>SQ1
SQ2
SQ3
qualifies
qualifies
qualifies SQ5
qualifies</p>
        <p>SQ6
qualifies SQ4</p>
        <p>dikbEvidence:
Non-traceable drug label evidence</p>
        <p>DRON:0001858
dikb:does_not_inhibit</p>
        <p>PRO:00006121
dikbEvidence:
EV_Data_CT_DDI
dikbEvidence:
EV_Method_CT_DDI
supports
challenges</p>
        <p>MP1
argues</p>
        <p>C1
supports</p>
        <p>S1
S2
S3
D1
Me
1
supports
supports
Fig. 2. A model of the evidence for and against the assertion escitalopram does not
inhibit CYP2D6. This is based on the Micropublications ontology, and reuses the
evidence taxonomy (dikbEvidence), terms (dikb), and data from the DIKB. The Drug
Ontology (DRON) and Protein Ontology (PRO) are reused in semantic qualifiers. A
more detailed view of Method Me1 is shown in Figure 1.</p>
        <p>The Micropublications ontology is used to structure the evidence relating to
data, methods, and materials, and the overall indication that evidence mp:supports
or mp:challenges a mp:Claim. We qualify Claims (C1 in the figure) by reusing
identifiers from DRON10 [Hanna2013] and the Protein Ontology11 [Natale2011].
The new model reuses the DIKB evidence taxonomy12 to provide epistemic
qualification (SQ2, SQ5, SQ6 in the figure) to statements (S1, S2, and S3 in the
figure), data (D1 in the figure), methods (Me1 in the figure), and materials (not
shown in this example). The Open Annotation Data Model (previously shown in
Figure 1) is used to link quotes taken from source documents back to their
originating information artifacts. The approach to modeling other DIKB assertions
would be similar to this example.
6
Certain benefits accrue from upgrading from the current DIKB. Many of the
competency questions (Section 4) are not supported in the DIKB 1.2. The new
model is designed to support these and additional questions relevant in the
domain. Visual inspection of the model suggests that we will be able to answer
some competency questions quite naturally. In particular, finding the assertions
that are not supported by evidence already in the evidence base, the evidence
that should be checked most thoroughly (e.g. evidence that by itself supports
multiple assertions), and the data, methods, and materials associated with a
given evidence item as described in source documents.</p>
        <p>Further, as a Linked Data resource, our new knowledge base will also enable
innovative queries using knowledge from other sources about tagged entities (i.e.,
drugs and proteins) represented in the evidence base. Unlike the current DIKB,
we will be able to render annotations in their original context. We also expect to
be able to support distributed community annotation/curation, since MP and
OA take account of provenance, and since OA is being increasingly adopted by
a variety of annotation tools.
Our project does raise certain modeling challenges. To date, MP has not been
used to represent both unstructured claims and the related logical sentences.
Figure 1 shows the assertion escitalopram does not inhibit CYP2D6 as unstructured
text. However, the DIKB requires that 1) assertions about PDDIs be formulated
by experts prior to collecting evidence, and 2) that the assertions be represented
both as unstructured statements and sentences in a logical formalism. Careful
thought is being put into how to properly accommodate this use case. Such
challenges are to be expected since MP is a relatively new ontology and since this is
a new application of it.</p>
        <p>Another challenge is to ensure that, as the evidence base scales, competency
questions can be answered efficiently. To address this, we building the model
using an iterative design-and-test approach. In this process, efficient querying is
a key requirement.
6.3</p>
      </sec>
      <sec id="sec-8-2">
        <title>Other issues</title>
        <p>For enabling synthesis over the PDDI information, the model is not the only
concern. Applying this model will require integration work. One challenge is
inherent to scholarly documents: the existing evidence items within the DIKB
refer to many data, materials, and methods that exist only in PDF documents
accessible only through proprietary portals or academic library systems.
Consequently, resolving annotations requires a method for pointing to proprietary
oa:target s.
7</p>
        <p>Conclusions &amp; Future Work
of the art from scientific documents. The knowledge representations we are now
creating will be beneficial for integrating PDDI evidence, and we hope they will
inspire an increased use of linked data for evidence synthesis in other domains.
Acknowledgments
This work was carried out during the tenure of an ERCIM “Alain Bensoussan”
Fellowship Programme. The research leading to these results has received
funding from the European Union Seventh Framework Programme (FP7/2007-2013)
under grant agreement no 246016, and a grant from the National Library of
Medicine (1R01LM011838-01). We thank Carol Collins, Lisa Hines, and John R
Horn for serving on the Evidence Panel of “Addressing PDDI Evidence Gaps”,
and for contributing to the competency questions presented here.
[Abarca2003] Abarca, Jacob, Daniel C. Malone, Edward P. Armstrong, Amy J. Grizzle,
Philip D. Hansten, Robin C. Van Bergen, and Richard B. Lipton. “Concordance of
severity ratings provided in four drug interaction compendia.” Journal of the
American Pharmacists Association 44;2 (2003): 136–141.
[Boyce2014] Boyce, R.D. “A Draft Evidence Taxonomy and Inclusion Criteria for the
Drug Interaction Knowledge Base.” August 9, 2014, url:
http://purl.net/net/druginteraction-knowledge-base/evidence-types-and-inclusion-criteria
[Boyce2007] Boyce, Richard D., Carol Collins, John Horn, and Ira Kalet. “Modeling
Drug Mechanism Knowledge Using Evidence and Truth Maintenance.” IEEE
Transactions on Information Technology in Biomedicine 11;4 (2007): 386–397.
[Boyce2009] Boyce, Richard D., Carol Collins, John Horn, and Ira Kalet.
“Computing with evidence: Part I: A drug-mechanism evidence taxonomy oriented toward
confidence assignment.” Journal of Biomedical Informatics 42;6 (2009): 979–989.
[Ciccarese2008] Ciccarese, Paolo N., Elizabeth Wu, Gwen Wong, Marco Ocana, June
Kinoshita, Alan Ruttenberg, and Tim Clark. “The SWAN biomedical discourse
ontology.” Journal of Biomedical Informatics 41;5 (2008): 739–751.
[Ciccarese2014] Ciccarese, Paolo N., Marco Ocana, and Tim Clark. “Open semantic
annotation of scientific publications using DOMEO.” Journal of Biomedical
Semantics Apr 24;3 (2012): Suppl 1:S1.
[Clark2014] Clark, Tim, Paolo N. Ciccarese, and Carole A. Goble. “Micropublications:
a semantic model for claims, evidence, arguments and annotations in biomedical
communications.” Journal of Biomedical Semantics 5;28 (2014).
[Hanna2013] Hanna, Josh, Eric Joseph, Mathias Brochhausen, and William R. Hogan.
“Building a drug ontology based on RxNorm and other sources.” Journal of
Biomedical Semantics 4 (2013): 44–52.
[Natale2011] Natale, Darren A., Cecilia N. Arighi, Winona C. Barker, Judith A. Blake,
Carol J. Bult, Michael Caudy, Harold J. Drabkin, Peter D’Eustachio, Alexei V.
Evsikov, Hongzhan Huang, Jules Nchoutmboube, Natalia V. Roberts, Barry Smith,
Jian Zhang and Cathy H. Wu. “The Protein Ontology: a structured representation of
protein forms and complexes.” Nucleic acids research 39, no. suppl 1 (2011): D539–
D545.
[Saverno2011] Saverno, Kim R., Lisa E. Hines, Terri L. Warholak, Amy J. Grizzle,
Lauren Babits, Courtney Clark, Ann M. Taylor, and Daniel C. Malone. “Ability of
pharmacy clinical decision-support software to alert users about clinically important
drug-drug interactions.” Journal of the American Medical Informatics Association
18;1 (2011): 32–37.
[Wang2010] Wang, Lorraine M., Maple Wong, James M. Lightwood, and Christine
M. Cheng. “Black box warning contraindicated comedications: concordance among
three major drug interaction screening programs.” Annals of Pharmacotherapy 44; 1
(2010): 28–34.
[W3C2013] Sanderson, Rob, Paolo N. Ciccarese, and Herbert Van de Sompel (editors).
“Open Annotation Data Model”, W3C Community Group Draft, 08 February 2013,
url: http://www.openannotation.org/spec/core/</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Capturing Provenance for a Linkset of</title>
    </sec>
    <sec id="sec-10">
      <title>Convenience</title>
      <p>Simon Jupp1, James Malone1, and Alasdair J G Gray2
1 European Molecular Biology Laboratory, European Bioinformatics Institute
(EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, United Kingdom
2 Department of Computer Science, Heriot-Watt University, Edinburgh, United</p>
      <p>Kingdom
Abstract. Biological interactions such as those between genes and
proteins are complex and require intricate OWL models. However, direct
links between biological entities can support search and data
integration. In this paper we introduce linksets of convenience that capture
these direct links. We show the provenance statements required to track
the derivation of such linksets; linking them back to the full biological
justification.</p>
      <p>Keywords: Data linking, Provenance, VoID
1
Investigating biological systems, such as those implicated in disease, necessitates
the connection of many levels of biology; gene, gene variation, gene expression,
protein structure, signalling pathways, phenotypic, epidemiological data and so
on. The ability to integrate data across these levels relies on links that can be
formed between biological entities, for example, going from a gene to proteins or
proteins to pathways. For each of these links there is some biological justification
that may involve several steps (see Section 2 for details). To support tasks such
as search and data integration it is convenient to provide additional shortcuts in
the form of a direct link, e.g. genes to pathways.</p>
      <p>Modeling the true nature of the links using semantic web technologies such
as OWL removes ambiguity when working with data by giving it a well defined
and precise semantics. However it increases the complexity of interacting with
the data as the OWL model needs to capture the full intricacies of the biological
interactions. As we move to publish biological data as linked open data, there
is an opportunity to describe direct links between different types of biological
entities as a shortcut to be made between entities which feature in common
queries, such as gene to protein; capturing the way that biologists often discuss
the domain and enable novel integrations of the data. These direct links provide
a working notion that cuts through the biology but which does not necessitate
capturing (or recapturing) the complex multivariate relationships that can hold
between the two entities. Such linksets are already used to support the Open</p>
      <p>Ensembl Exon
so:exon</p>
      <p>so:has_part
so:gene
so:transcript</p>
      <p>so:polypeptide
Ensembl Gene
so:transcribed_from</p>
      <p>
        Ensembl Transcript Ensembl Protein
so:translates_to
:ep2upRelation
uniprot:Protein
PHACTS Discovery Platform [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], although those linksets do not have adequate
provenance.
      </p>
      <p>
        In this paper we propose a mechanism to model these links of convenience
using a combination of VoID linksets [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and PROV [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We avoid
misrepresenting links by applying semantically weaker relationships together with additional
provenance which represents the underlying complexity. We illustrate the model
with an example using data from two popular biological databases.
2
      </p>
      <p>Linking genes to proteins use case.</p>
      <p>
        We motivate our work with an example mapping between Ensembl [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] (a database
of genome annotation) and Uniprot [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] (a database of protein sequences). These
databases already contain cross-references between an Ensembl Gene (EG) and
a Uniprot Protein (UP). However to understand how this mapping is generated
you currently need to discover the correct publications and online
documentation; they are not directly discoverable from the data.
      </p>
      <p>Biological theory tells us that a gene encodes for a protein, although this
biological relation only truly holds for the link between the EG and the Ensembl
Protein (EP) entity. There are in fact multiple types of UP to EP mappings, for
instance they can be derived from an exact sequence identity or they might be
based on a percentage sequence identity. Figure 1 illustrates how we model EG
to EP using terminology defined in the Sequence Ontology, and for illustration
we include a superproperty of the all the EP to UP mappings that we call
ep2upRelation3. We introduce a link of convenience (dashed line) that links
the EG to UP that is there to support queries using the semantically weak
skos:related relation. This schema lacks the provenance to assert that the
related link of convenience is derived from the longer chain of semantically richer
links that hold from a gene to protein.
3 UniProt are currently extending their vocabulary to define these relations.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# define the ensembl protein partition
:ensembl void:classPartition :EPpartition .
:EPpartition void:class so:Polypeptide .
# define the Uniprot protein partition
:uniprot void:classPartition :UPpartition .
:UPpartition void:class uniprot:Protein .
# define the linkset that links the two partitions
:ensemblProteinToUniprotProteinLinkset a void:Linkset ;</p>
      <p>void:linkPredicate :ep2upRelation ;
# define partitions for ensembl gene, gene transcript and
# transcript protein
:ensembl void:classPartition :ensemblGenePartition ;
void:propertyPartition :ensemblGeneTranscriptPartition ;
void:propertyPartition :ensemblTranscriptProteinPartition ;
:ensemblGenePartition void:class so:gene .
:ensemblGeneTranscriptPartition void:property so:transcribed_from .
:ensemblTranscriptProteinPartition void:property so:translates_to .
# define the linkset that links the two partitions,
# including the dataset description that contains the triples that
# are used to derive this linkset
:ensemblGeneToUniprotProteinLinkset a void:Linkset ;
void:linkPredicate skos:related ;
void:subjectsTarget :ensemblGenePartition;
void:objectsTarget :UPpartition;
prov:wasDerivedFrom :ensemblGeneTranscriptPartition,
:ensemblTranscriptProteinPartition,
:ensemblProteinToUniprotProteinLinkset
The model outlined in Figure 1 can be decorated with provenance that captures
additional information about how the link of convenience between EG and UP is
derived. The resulting linkset description is shown in Figure 2. In the following
we describe the blocks of RDF.</p>
      <p>The VoID vocabulary of linked datasets allows the description of RDF links
between datasets using VoID linksets. A linkset allows us to describe the links,
captured as a set of triples, between two datasets. We can use VoID to describe
relevant partitions of the datasets based on individual properties or classes, these
form new subsets that can participate in multiple linksets. In our scenario we
need to capture two crucial linksets; the first is the EP to UP linkset, and the
second is the more convenient EG to UP linkset.</p>
      <p>The EP-UP linkset captures the :ep2upRelation link between types of EP
in the Ensembl dataset, and types of UP in the UniProt dataset (lines 10-11).
We describe two further subsets; the EP partition of all entities that are of type
so:Polypeptide in the Ensembl dataset (lines 2-3) and the UniProt subset of
all entities that are of type uniprot:Protein (lines 6-7).</p>
      <p>The EG to UP link of convenience needs a similar linkset description based
on an EG partition and the previous UP partition, although this time the
relation is skos:related (lines 25-26). We also want to capture that the triples
in this linkset are derived from another set of triples. This captures that the
skos:related is a shortcut relation for a more complex path through the RDF
graph. Again we can use VoID partitioning, but this time using a property based
partition to identify the EG to Ensembl Transcript (ET) and ET to EP links
(lines 15-20) . Finally we use the prov:wasDerivedFrom relation to link the
convenience linkset to the linksets that describe the full path of relations that the
shortcut represents (line 28-30).
4</p>
      <p>Discusion
It is always important to try and model your data as accurately as possible,
and publishing data with RDF and OWL is well suited for this task. The VoID
vocabulary already provides a mechanism to define and attach provenance to
linksets between datasets, and we are proposing the use of PROV to connect
linksets that are derived from other linksets. As a Web of linked biological data
emerges, there is a need to identify links that are there for convenience, and
expose how they relate back to the core biological (OWL) model. In cases where
a link of convenience is derived from a series of other linksets, it is desirable to
be able to spot this and unpack the convenience links using common queries.
The model proposed supports this task but questions remain as to whether VoID
and PROV are enough, so we hope this preliminary work can help motivtate the
discussion.</p>
      <sec id="sec-10-1">
        <title>Acknowledgements</title>
        <p>EBI contribution supported by EU FP7 BioMedBridges Grant 284209.
References
1. Gray, A.J.G., Groth, P., Loizou, A., Askjaer, S., Brenninkmeijer, C.Y.A., Burger,
K., Chichester, C., Evelo, C.T., Goble, C.A., Harland, L., Pettifer, S., Thompson,
M., Waagmeester, A., Williams, A.J.: Applying linked data approaches to
pharmacology: Architectural decisions and implementation. Semant. Web 5 (2014) 101–113
2. Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing Linked Datasets
with the VoID Vocabulary. Note, W3C (March 2011)
3. Lebo, T., Sahoo, S.S., Mcguinness, D.: PROV-O: The PROV Ontology. Technical
report, W3C Recommendation (2013) http://www.w3.org/TR/prov-o/.
4. Flicek, P., Amode, M.R., Barrell, D., et al: Ensembl 2014. Nucleic acids research
42 (2014) D749–D755 doi: 10.1093/nar/gkt1196.
5. The UniProt Consortium: Activities at the universal protein resource (UniProt).</p>
        <p>Nucleic acids research 42 (2014) D191–D198 doi: 10.1093/nar/gkt1140.</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Connecting Science Data Using Semantics and</title>
    </sec>
    <sec id="sec-12">
      <title>Information Extraction</title>
      <p>Evan W. Patton and Deborah L. McGuinness</p>
      <p>Rensselaer Polytechnic Institute
110 8th Street, Troy, NY 12180 USA</p>
      <p>{pattoe, dlm}@cs.rpi.edu
Abstract. We are developing prototypes that explicate our vision of
connecting personal medical data to scientific literature as well as to
emerging grey literature (e.g., community forums) to help people find
and understand information relevant to complex medical journeys. We
focus on robust combinations of natural language processing along with
linked data and knowledge representation to build knowledge graphs that
help people make sense of current conditions and enable new manners of
scientific hypothesis generation. We present our work in the context of a
breast cancer use case. We discuss the benefits of biomedical linked data
resources and describe some potential assistive technology for navigating
rich, diverse medical content.</p>
      <p>Keywords: knowledge representation, explanation, clinical notes,
natural language, web forums, nanopublications
1
As scientific knowledge continues to grow in size and diversity, it is increasingly
difficult to discover and manage information relevant to any particular context.
It can be challenging to determine how a statement or report relates to others
and to form and evaluate (often competing) hypotheses, e.g. related to diagnosis
or treatment paths. Complications grow when content is both structured and
unstructured, and when some is from less accredited sources. We aim to expand
the boundaries of Linked Science by focusing on evidence modeling from
natural language processing techniques (NLP) over broad content and by identifying
promising data-driven hypotheses using linked data and nanopublication style
encodings. We present this discussion in the context of a breast cancer
demonstration use case informed by challenges experienced during a co-author’s recent
cancer journey. Cancer is a complex disease to manage and treat, often requiring
chemotherapy, surgery, radiation, and drugs to reduce recurrence. We show how
management of this information by the patient is aided by semantic technologies
combined with natural language processing algorithms.</p>
      <p>A breast cancer patient wishes to better understand her diagnosis and
planned treatment. She is interested in expected chemotherapy side
effects, and leveraging experiences of other similar individuals to
proactively find and evaluate promising coping strategies. She reads through
oncologist-provided documents about her proposed chemotherapy drugs
and uses search engines to find more about likely adverse effects that
appear detrimental to her quality of life. She finds conflicting opinions
on the efficacy of different coping strategies, and needs to determine an
approach to effectively weigh the possible pros and cons. Managing this
information is mentally taxing and can easily overwhelm a patient.</p>
      <p>Our patient needs to find and comprehend potentially conflicting evidence
about treatment options and side effects. We propose new software, using a
variety of artificial intelligence tools built on the interoperability principles
promulgated by linked data and the Semantic Web, to address these challenges.
2
The patient uses current technologies to obtain information about her
treatment strategy and to formulate promising side effect mitigations. This can be
time consuming for anyone, but more so for medically na¨ıve patients.
Furthermore, technologies such as web forums or social networking sites are becoming
increasingly common for discourse between patients as they can often include
anecdotal reports, that have not yet been validated through clinical trials, but
may be valuable. They are often presented in layperson terms and sometimes
attract new patients who may be less medically literate. Due to lack of
scientific rigor, there may be contradictory or unsupported information available, as
shown in the following two answers about a mitigation for the very common,
taxol-related, nail bed problem:</p>
      <p>
        My onc[ology] nurse told me to rub tea tree oil into my cuticles and
nails every night. It is a natural anti-septic and for whatever reason can
sometimes help prevent nail infections and lifting during taxol. 1
I wouldn’t use tea tree oil. A friend did on some cracked skin and it got
worse. 2
The first suggestion is a common preventive approach for nail problems: tea tree
oil prevents nail infections because “it is a natural anti-septic” and appeals to
authority “my onc nurse told me to...”. The second suggestion from a different
user in the same thread advises against tea tree oil as “a friend [applied tea
tree oil] on some cracked skin and it got worse.” Natural Language techniques
may be used to extract coping strategies for particular conditions but without
deeper knowledge, provenance, and tools, the user may not know how to evaluate
and/or integrate potentially contradictory suggestions. We are extending joint
extraction techiques proposed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] with semantic background knowledge to aid
in extracting linked data from medical records.
The Repurposing Drugs using Semantics (ReDrugS) project [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] has focused on
modeling evidence using small units of publishable information called
Nanopublications [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. ReDrugS utilizes linked data sources to build a knowledge base of
nanopublications that is then reasoned about using probabilistic techniques to
identify potential links between proteins, drugs, binding sites, and genes, with
the ultimate aim of discovering possible new off-label uses for FDA-approved
drugs. This project’s success has been partially due to the large corpus of linked
data and ontologies generated by the biomedical community over the past few
decades. ReDrugS has ingested content from 17 structured curated data sources,
including content concerning drugs, alternate names, conditions, and pathways.
Once a chemotherapy protocol is extracted from medical notes, ReDrugs can be
used to find alternative drug names along with related conditions. This
framework, along with the side effect resource SIDER in process, can be used to
improve the patient’s process in finding chemotherapy drug side effects and some
mitigations by applying its search techniques to authoritative drug resources,
such as looking for anti-nausea prescription drugs. The infrastructure for this
system could be repurposed for other scientific domains, but only if linked data
sources are abundant in those domains or if quality linked data can be
generated from automated methods, e.g. via natural language processing of web-based
resources.
      </p>
      <p>
        Explanations
We aim to provide extensive explanation mechanisms since explanation is a key
component of transparent systems and user studies have shown that
explanations are required if agents are to be trusted [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We aid explanation generation
through the collection of provenance, modeled using the W3C’s PROV
ontology [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. PROV-O is a standard for modeling provenance information on the web,
which allows tools to integrate distributed provenance information from different
systems. We use this provenance to help construct end user explanations that
include both lineage of content and support (and opposition) for a statement.
      </p>
      <p>
        We identify potential evidence on the use of tea tree oil in
chemotherapyinduced nail bed problems. Not only would a patient want to know evidence,
source, and authoritativeness for both views, she might also want the system
further decompose these arguments and present supporting evidence as to the
antimicrobial nature of tea tree oil in more authoritative sources (e.g. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]).
      </p>
      <p>We claim that we can reuse the ReDrugS content to find prescription drugs
for chemotherapy side effects. Provenance may be displayed to show that the
recommendation is from a validated authoritative source. While that framework
was originally designed to find potential new off-label uses for drugs along with
confidence ratings, the explanation component is more critical for our use so that
researchers may inspect evidence sources and the methods used to determine
the system confidence. Without such explanations, people would have difficulty
evaluating competing suggestions.
Natural Language Processing can expose some of the unstructured content of
medical records as structured content as well as assist in generating linked data
from unstructured sources. The ReDrugS framework provides a
semanticallyintegrated system combining many different structured biomedical resources to
generate a broadly reusable knowledge graph. By integrating the natural
language and structured knowledge representation approaches, we can obtain a
much richer annotated knowledge base that includes source and confidence
information. Our prototypes demonstrate some ways that this rich resource may then
be used to help patients and their support networks to discover, integrate, and
evaluate information relevant to complicated medical situations and to help form
transparent and data-driven hypotheses about how to proceed. We believe these
efforts demonstrate some opportunities for future AI-enhanced Linked
Sciencebased assistants that use the wealth of structured content as well as the growing
grey literature collection.</p>
      <p>Acknowledgements
The authors thank Heng Ji and Alex Borgida for their discussions that helped
shape this work.</p>
      <p>References</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Glass</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolverton</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Toward establishing trust in adaptive agents</article-title>
          .
          <source>In: 13th Intl Conference on Intelligent User Interfaces</source>
          . pp.
          <fpage>227</fpage>
          -
          <lpage>236</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gibson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velterop</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The anatomy of a nanopublication</article-title>
          .
          <source>Information Services &amp; Use</source>
          <volume>30</volume>
          ,
          <fpage>51</fpage>
          -
          <lpage>56</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Lebo</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sahoo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          :
          <article-title>PROV-O: The PROV ontology</article-title>
          .
          <source>Tech. rep., W3C</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
          </string-name>
          , H.:
          <article-title>Incremental joint extraction of entity mentions and relations</article-title>
          .
          <source>In: Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>McCusker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solanki</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumontier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dordick</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.:</given-names>
          </string-name>
          <article-title>A nanopublication framework for systems biology and drug repurposing</article-title>
          .
          <source>In: CSHALS</source>
          <year>2014</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Pazyar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yaghoobi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bagherani</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaerouni</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A review of applications of tea tree oil in dermatology</article-title>
          .
          <source>International Journal of Dermatology</source>
          pp.
          <fpage>784</fpage>
          -
          <lpage>90</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>