<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Three Steps to Heaven: Semantic Publishing in a Real World Work ow</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Phillip Lord</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simon Cockell</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Stevens</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Semantic publishing o ers the promise of computable papers, enriched visualisation and a realisation of the linked data ideal. In reality, however, the publication process contrives to prevent richer semantics while culminating in a `lumpen' PDF. In this paper, we discuss a web- rst approach to publication, and describe a three-tiered approach which integrates with the existing authoring tooling. Critically, although it adds limited semantics, it does provide value to all the participants in the process: the author, the reader and the machine.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The publishing of both data and narratives on those data are changing radically.
Linked Open Data and related semantic technologies allow for semantic
publishing of data. We still need, however, to publish the narratives on that data and
that style of publishing is in the process of change; one of those changes is the
incorporation of semantics [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1,2,3</xref>
        ]. The idea of semantic publishing is an attractive
one for those who wish to consume papers electronically; it should enhance the
richness of the computational component of papers [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. It promises a realisation
of the vision of a next generation of the web, with papers becoming a critical
part of a linked data environment [
        <xref ref-type="bibr" rid="ref1 ref4">1,4</xref>
        ], where the results and naratives become
one.
      </p>
      <p>
        The reality, however, is somewhat di erent. There are signi cant barriers to
the acceptance of semantic publishing as a standard mechanism for academic
publishing. The web was invented around 1990 as a light-weight mechanism for
publication of documents. It has subsequently had a massive impact on society
in general. It has, however, barely touched most scienti c publishing; while most
journals have a website, the publication process still revolves around the
generation of papers, moving from Microsoft Word or LATEX [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], through to a nal
PDF which looks, feels and is something designed to be printed onto paper1.
Adding semantics into this environment is di cult or impossible; the content
of the PDF has to be exposed and semantic content retro- tted or, in all
likelihood, a complex process of author and publisher interaction has to be devised
and followed. If semantic data publishing and semantic publishing of academic
narratives are to work together, then academic publishing needs to change.
1 This includes conferences dedicated to the web and the use of web technologies.
      </p>
      <p>In this paper, we describe our attempts to take a commodity publication
environment, and modify it to bring in some of the formality required from
academic publishing. We illustrate this with three exemplars - di erent kinds
of knowledge that we wish to enhance. In the process, we add a small amount
of semantics to the nished articles. Our key constraint is the desire to add
value for all the human participants. Both authors and readers should see and
recognise additional value, with the semantics a useful or necessary byproduct
of the process, rather than the primary motivation. We characterise this process
as our \three steps to heaven", namely:
{ make life better for the machine to
{ make life better for the author to
{ make life better for the reader</p>
      <p>While requiring additional value for all of these participants is hard, and
places signi cant limitations on the level of semantics that can be achieved, we
believe, it does increase the likelihood that content will be generated in the rst
place, and represents an attempt to enable semantic publishing in a real-world
work ow.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Knowledgeblog</title>
      <p>
        The knowledgeblog project stemmed from the desire for a book describing the
many aspects of ontology development, from the underlying formal semantics, to
the practical technology layer and, nally, through to the knowledge domain [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
However, we have found the traditional book publishing process frustrating and
unrewarding. While scienti c authoring is di cult in its own right, our own
experience suggests that the publishing process is extremely hard-work. This is
particularly so for multi-author collected works which are often harder for the
editor than writing a book \solo". Finally, the expense and hard copy nature of
academic books means that, again in our experience, few people read them.
      </p>
      <p>
        This contrasts starkly with the web- rst publication process that has become
known as blogging. With any of a number of ready made platforms, it is possible
for authors with little or no technical skill, to publish content to the web with
ease. For knowledgeblog (\kblog"), we have taken one blogging engine,
WordPress [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], running on low-end hardware, and used it to develop a multi-author
resource describing the use of ontologies in the life sciences (our main eld of
expertise). There are also kblogs on bioinformatics2 and the Taverna work ow
environment3 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. We have previously described how we addressed some of the
social aspects, including attribution, reviewing and immutablity of articles[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>As well as delivering content, we are also using this framework to investigate
semantic academic publishing, investigating how we can enhance the machine
interpretability of the nal paper, while living within the key constraint of making</p>
      <sec id="sec-2-1">
        <title>2 http://bioinformatics.knowledgeblog.org</title>
      </sec>
      <sec id="sec-2-2">
        <title>3 http://taverna.knowledgeblog.org</title>
        <p>life (slightly) better for machine, author and reader without adding complexity
for the human participants.</p>
        <p>Scienti c authors are relatively conservative. Most of them have well-established
toolsets and work ows which they are relatively unwilling to change. For
instance, within the kblog project, we have used workshops to start the process of
content generation. For our initial meeting, we gave little guidance on authoring
process to authors, as a result of which most attempted to use WordPress
directly for authoring. The WordPress editing environment is, however, web-based,
and was originally designed for editing short, non-technical articles. It appeared
to not work well for most scientists.</p>
        <p>
          The requirements that authors have for such `scienti c' articles are manifold.
Many wish to be able to author while o ine (particularly on trains or planes).
Almost all scienti c papers are multi-author, and some degree of collaboration
is required. Many scientists in the life sciences wish to author in Word because
grant bodies and journals often produce templates as Word documents. Many
wish to use LATEX, because its idiomatic approach to programming documents
is unreplicable with anything else. Fortunately, it is possible to induce
WordPress to accept content from many di erent authoring tools, including Word
and LATEX[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>As a result, during the kblog project, we have seem many di erent work ows
in use, often highly idiosyncratic in nature. These include:
Word/Email: Many authors write using MS Word and collaborate by emailing
les around. This method has a low barrier to entry, but requires signi cant
social processes to prevent con icting versions, particularly as the number
of authors increases.</p>
        <p>Word/Dropbox: For the taverna kblog, authors wrote in Word and
collaborated with Dropbox.4 This method works reasonably well where many
authors are involved; Dropbox detects con icts, although cannot prevent or
merge them.</p>
        <p>Asciidoc/Dropbox: Used by the authors of this paper. Asciidoc5 is relatively
simple, somewhat programmable and accessible. Unlike LATEX which can be
induced to produce HTML with e ort, asciidoc is designed to do so.</p>
        <p>Of these three approaches probably the Word/Dropbox combination is the
the most generally used.</p>
        <p>From the readers perspective, a decision that we have made within
knowledgeblog is to be \HTML- rst". The initial reasons for this were entirely
practical; supporting multiple toolsets is hard, particularly if any degree of consistency
is to be maintained; the generation of the HTML is at least partly controlled by
the middleware { WordPress in kblog's case. As well as enabling consistency of
presentation it also, potentially, allows us to add additional knowledge; it makes
semantic publication a possibility. However, we are aware that knowledgeblog
currently scores rather badly on what we describe as the \bath-tub test"; while</p>
      </sec>
      <sec id="sec-2-3">
        <title>4 http://www.dropbox.com</title>
      </sec>
      <sec id="sec-2-4">
        <title>5 http://www.methods.co.nz/asciidoc/</title>
        <p>exporting to PDF or printing out is possible, the presentation is not as \neat" as
would be ideal. In this regard (and we hope only in this regard), the
knowledgeblog experience is limited. However, increasingly, readers are happy and capable
of interacting with material on the web, without print outs.</p>
        <p>From this background and aim, we have drawn the following requirements:
1. The author can, as much as possible, remain within familiar authoring
environments;
2. The representation of the published work should remain extensible to, for
instance, semantic enhancements;
3. The author and reader should be able to have the amount of \formal"
academic publishing they need;
4. Support for semantic publishing should be gradual and o er advantages for
author and reader at all stages.</p>
        <p>We describe how we have achieved this with three exemplars, two of which
are relatively general in use, and one more speci c to biology. In each case, we
have taken a slightly di erent approach, but have ful lled our primary aim of
making life better for machine, author and reader.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Representing Mathematics</title>
      <p>The representation of mathematics is a common need in academic literature.
Mathematical notation has grown from a requirement for a syntax which is highly
expressive and relatively easy to write. It presents speci c challenges because of
its complexity, the di culty of authoring and the di culty of rendering, away
from the chalk board that is its natural home.</p>
      <p>
        Support for mathematics has had a signi cant impact on academic
publishing. It was, for example, the original motivation behind the development of
TEX [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and it still one of the main reasons why authors wish to use it or its
derivatives. This is to such an extent that much mathematics rendering on the
web is driven by a TEX engine somewhere in the process. So MediaWiki (and
therefore Wikipedia), Drupal and, of course, WordPress follow this route. The
latter provides plugin support for TEX markup using the wp-latex plugin [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
Within kblog, we have developed a new plugin called mathjax-latex [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. From
the kblog author's perspective these two o er a similar interface { di erences
are, therefore, described later.
      </p>
      <p>Authors write their mathematics directly as TEX using one of the four markup
syntaxes. The most explicit (and therefore least likely to happen accidentally)
is through the use of \shortcodes".6 These are a HTML-like markup originating
from some forum/bulletin board systems. In this form an equation would be
entered as [latex]e=mc^2[/latex], which would be rendered as \e = mc2". It
is also possible to three other syntaxes which are closer to math-mode in TEX:
$$e=mc^2$$, $latex e=mc^2$, or \[e=mc^2\].</p>
      <sec id="sec-3-1">
        <title>6 http://codex.wordpress.org/Shortcode</title>
        <p>From the authorial perspective, we have added signi cant value, as it is
possible to use a variety of syntaxes, which are independent of the authoring
engine. For example, a TEX-loving mathematician working with a Word-using
biologist can still set their equations using TEX syntax; although Word will not
render these at authoring time but, in practice, this causes few problems for
such authors, who are experiened at reading TEX. Within an LATEX work ow
equations will be renderable both locally with source compiled to PDF, and
published to WordPress.</p>
        <p>There is also a W3C recommendation, MathML for the representation and
presentation of mathematics. The kblog environment also supports this. In this
case, the equivalent source appears as follows:
&lt;math&gt;
&lt;mrow&gt;
&lt;mi&gt;E&lt;/mi&gt;
&lt;mo&gt;=&lt;/mo&gt;
&lt;mrow&gt;
&lt;mi&gt;m&lt;/mi&gt;
&lt;msup&gt;
&lt;mi&gt;c&lt;/mi&gt;
&lt;mn&gt;2&lt;/mn&gt;
&lt;/msup&gt;
&lt;/mrow&gt;
&lt;/mrow&gt;
&lt;/math&gt;</p>
        <p>One problem with the MathML representation is obvious: it is very
longwinded. A second issue, however, is that it is hard to integrate with existing
work ows; most of the publication work ows we have seen in use will on
recognising an angle bracket turn it into the equivalent HTML entity. For some
workows (LATEX, asciidoc) it is possible, although not easy, to prevent this within
the native syntax.</p>
        <p>It is also possible to convert from Word's native OMML (\equation editor")
XML representation to MathML, although this does not integrate with Word's
native blog publication work ow. Ironically, it is because MathML shares an
XML based syntax with the nal presentation format (HTML) that the
problem arises. The shortcode syntax, for example, passes straight-through most of
the publication frameworks to be consumed by the middleware. From a
pragmatic point of view, therefore, supporting shortcodes and TEX-like syntaxes has
considerable advantages.</p>
        <p>For the reader, the use of mathjax-latex has signi cant advantages. The
default mechanism within WordPress uses a math-mode like syntax $latex e=mc^2$.
This is rendered using a TEX engine into an image which is then incorporated
and linked using normal HTML capabilities. This representation is opaque and
non-semantic; it has signi cant limitations for the reader. The images are not
scalable { zooming in cases severe pixalation; the background to the mathematics
is coloured inside the image, so does not necessarily re ect the local style.</p>
        <p>
          Kblog, however, uses the MathJax library[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]; this has a number of signi cant
advantages for the reader. First, where the browser supports them, MathJax uses
webfonts to render the images; these are scalable, attractive and standardized.
Where they are not available, MathJax can fall-back to bitmapped fonts. The
reader can also access additional functionality: clicking on an equation will raise
a zoomed in popup; while the context menu allows access to a textual
representation either as TEX or MathML irrespective of the form that the author used.
This can be cut-and-paste for further use. Kblog uses the MathJax library[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]
to render the underlying TEX directly on the client.
        </p>
        <p>Our use of MathJax provides no signi cant disadvantages to the middleware
layers. It is implemented in JavaScript and runs in most environments. Although,
the library is fairly large (&gt;100Mb), but is available on a CDN so need not stress
server storage space. Most of this space comes from the bit-mapped fonts which
are only downloaded on-demand, so should not stress web clients either. It also
obviates the need for a TEX installation which wp-latex may require (although
this plugin can use an external server also).</p>
        <p>At face value, mathjax-latex necessarily adds very little semantics to the
maths embedded within documents. The maths could be represented as $$E=mc^2$$,
\(E=mc^2\)] or
&lt;math&gt; &lt;mrow&gt; &lt;mi&gt;E&lt;/mi&gt; &lt;mo&gt;=&lt;/mo&gt; &lt;mrow&gt; &lt;mi&gt;m&lt;/mi&gt;
&lt;msup&gt; &lt;mi&gt;c&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt; &lt;/msup&gt;
&lt;/mrow&gt; &lt;/mrow&gt; &lt;/math&gt;</p>
        <p>So, we have a heterogenous representation for identical knowledge. However,
in practice, the situation is much better than this. The author of the work created
these equation and has then read them, transformed by MathJax into a rendered
form. If MathJax has failed to translate them correctly, in line with the author's
intention, or if it has had some implications for the text in addition to setting
the intended equations (if the TEX style markup appears accidentally elsewhere
in the document), the author is likely to have seen this and xed the problem.
Someone wishing, for example, to extract all the mathematics as MathML from
these documents computationally, therefore, knows:
{ that the document contains maths as it imports MathJax
{ that MathJax is capable of identifying this maths correctly
{ that equations can be transformed to MathML using MathJax7.</p>
        <p>So, while our publication environment does not result directly in lower level
of semantic heterogeneity, it does provide the data and the tools to enable the
computational agent to make this transformation. While this is imperfect, it
should help somewhat.</p>
        <p>In short, we provide a practical mechanism to identify text containing
mathematics and a mechanism to transform this to a single, standardised
representation.
7 This is assuming MathJax works correctly in general. The authors and readers are
checking the rendered representation. It is possible that an equation would render
correctly on screen, but be rendered to MathML inaccurately</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Representing References</title>
      <p>We do not yet support LATEX/BibTeX citations, although we see no
reason why a similar style le should not be supported. We do, however, support
BibTeX-formatted les: the rst author's preferred editing/citation environment
is based around these with Emacs, RefTeX, and asciidoc. While this is
undoubtedly a rather niche authoring environment, the (slightly elided) code for
supporting this demonstrates the relative ease with which tool chains can be induced to
support kcite:
8 http://wordpress.org/extend/plugins/kcite/
9 http://wordpress.org/extend/plugins/kcite/
10 http://www.datacite.org/
11 http://www.ncbi.nlm.nih.gov/pubmed/
12 http://arxiv.org/
(defadvice reftex-format-citation (around phil-asciidoc-around activate)
(if phil-reftex-citation-override</p>
      <p>(setq ad-return-value (phil-reftex-format-citation entry format))
ad-do-it))
(defun phil-reftex-format-citation( entry format )
(let ((doi (reftex-get-bib-field "doi" entry)))</p>
      <p>(format "pass:[[cite source='doi'\\]%s[/cite\\]]" doi)))</p>
      <p>The key decision with kcite from the authorial perspective is to ignore the
reference list itself and focus only on in-text citations, using public identi ers
to references. This simpli es the tool integration process enormously, as this
is the only data that needs to pass from the author's bibliographic database
onward. The key advantage for authors here is two-fold: they are not required to
populate their reference metadata for themselves, and this metadata will update
if it changes. Secondly, the identi ers are checked; if they are wrong, the authors
will see this straightforwardly as the entire reference will be wrong. Adding DOIs
or other identi ers moves from becoming a burden for the author to becoming
a speci c advantage.</p>
      <p>While supporting multiple forms of reference identi er (CrossRef DOI,
DataCite DOI, arXiv and PubMed ID) provides a clear advantage to the author, it
comes at considerable cost. While it is possible to get metadata about papers
from all of these sources, there is little commonality between them. Moreover,
resolving this metadata requires one outgoing HTTP request13 per reference,
which browser security might or might not allow.</p>
      <p>So, while the presentation of mathematics is performed largely on the client,
for reference lists the kcite plugin performs metadata resolution and data
integration on the server. A caching functionality is provided, storing this metadata
in the WordPress database. The bibliographic metadata is nally transferred to
the client encoded as JSON, using asynchronous call-backs to the server.</p>
      <p>Finally, this JSON is rendered using the citeproc-js library on the client. In
our experience, this performs well, adding to the readers' experience; in-text
citations are initially shown as hyperlinks; rendering is rapid, even on aging
hardware, and nally in-text citations are linked both to the bibliography and
directly through to the external source. Currently, the format of the reference
list is xed, however, citeproc-js is a generalised reference processor, driven using
CSL14. This makes it straight-forward to change citation format, at the option
of the reader, rather than the author or publisher. Both the in-text citation
and bibliography support outgoing links direct to the underlying resources15.
As these links have been used to gather metadata, they are likely to be correct.
While these advantages are relatively small currently, we believe that the use of
JavaScript rendering over a linked references can be used to add further reader
value in future.
13 In practice, it is often more; DOI requests, for instance, use 303 redirects.
14 http://citationstyles.org/
15 Where the identi er allows { PubMed IDs redirect to PubMed.</p>
      <p>For the computational agent wishing to consume bibliographic information,
we have added signi cant value compared to the pre-formatted HTML reference
list. First, all the information required to render the citation is present in the
in-text citation next to the text that the authors intended. A computational
agent can, therefore, ignore the bibliography list itself entirely. These primary
identi ers are, again, likely to be correct because the authors now need them to
be correct for their own bene t.</p>
      <p>Should the computational agent wish, the (denormalised) bibliographic data
used to render the bibliography is actually available, present in the underlying
HTML as a JSON string. This is represented in a homogeneous format, although,
of course, represents our (kcite's) intepretation of the primary data.</p>
      <p>A nal, and subtle, advantage of kcite is that the authors can only use public
metadata, and not their own. If they use the correct primary identi er, and
still get an incorrect reference, it follows that the public metadata must be
incorrect16. Authors and readers therefore must ask the metadata providers to
x their metadata to the bene t of all. This form of data linking, therefore, can
even help those who are not using it.
4.1</p>
      <p>
        Microarray Data
Many publications require that papers discussing microarray experiments lodge
their data in a publically available resource such as ArrayExpress [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Authors
do this placing an ArrayExpress identi er which has the form E-MEXP-1551.
Currently, adding this identi er to a publication, as with adding the raw data
to the repository is no direct advantage to the author, other than ful lment of
the publication requirement. Similarly, there is no existing support within most
authoring environments for adding this form of reference.
      </p>
      <p>For the knowledgeblog-arrayexpress plugin,17 therefore, we have again used a
shortcode representation, but allowed the author to automatically ll metadata,
direct from ArrayExpress. So a tag such as:
[aexp id="E-MEXP-1551"]species[/aexp]
will be replaced with Saccharomyces cerevisiae, while:
[aexp id="E-MEXP-1551"]releasedate[/aexp]
will be replaced by \2010-02-24". While the advantage here is small, it is signi
cant. Hyperlinks to ArrayExpress are automatic, authors no longer need to look
up detailed metadata. For metadata which authors are likely to know anyway
(such as Species), the automatic lookup operates as a check that their
ArrayExpress ID is correct. As with references 6, the use of an identi er becomes an
advantage rather than a burden to the authors.</p>
      <p>Currently, for the reader there is less signi cant advantage at the moment.
While there is some value to the author of the added correctness stemming from
the ArrayExpress identi er. However, knowledgeblog-arrayexpress is currently
under-developed, and the added semantics that is now present could be used
16 Or, we acknowledge, that kcite is broken!
17 http://knowledgeblog.org/knowledgeblog-arrayexpress
more extensively. The unambiguous knowledge that:
[aexp id="E-MEXP-1551"]species[/aexp]
represents a species would allow us, for example, to link to the NCBI taxonomy
database.18</p>
      <p>Likewise, advantage for the computational agent from
knowledgeblog-arrayexpress is currently limited; the identi ers are clearly marked up, and as the
authors now care about them, they are likely to be correct. Again, however,
knowledgeblog-arrayexpress is currently under developed for the computational
agent. The knowledge that is extracted from ArrayExpress could be presented
within the HTML generated by knowledgeblog-arrayexpress, whether or not it
is displayed to the reader for, essentially no cost. By having an underlying
shortcode representation, if we choose to add this functionality to
knowledgeblogarrayexpress, any posts written using it would automatically update their HTML.
For the text-mining bioinformatician, even the ability to unambiguously
determine that a paper described or used a data set relating to a speci c species using
standardised nomenclature19 would be a considerable boon.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>Our approach to semantic enrichment of articles is a measured and evolutionary
approach. We are investigating how we can increase the amount of knowledge in
academic articles presented in a computationally accessible form. However, we
are doing so in an environment which does not require all the di erent aspects
of authoring and publishing to be over-turned. More over, we have followed a
strong principle of semantic enhancement which o ers advantages to both reader
and author immediately. So, adding references as a DOI, or other identi er,
`automagically' produces an in text citation and a nicely formatted reference list:
that the reference list is no longer present in the article, but is a visualisation
over linked data; that the article itself has become a rst class citizen of this
linked data environment is a happy by-product.</p>
      <p>This approach, however, also has disadvantages. There are a number of
semantic enhancements which we could make straight-forwardly to the
knowledgeblog environment that we have not; the principles that we have adopted requires
signi cant compromise. We o er here two examples.</p>
      <p>
        First, there has been signi cant work by others on CiTO [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] { an
ontology which helps to describe the relationship between the citations and a paper.
Kcite lays the ground-work for an easy and straight-forward addition of CiTO
tags surrounding each in-text citation. Doing so, would enable increased
machine understandability of a reference list. Potentially, we could use this to the
advantage to the reader also: we could distinguish between reviews and primary
research papers; highlight the authors' previous work; emphasise older papers
which are being refuted. However, to do this requires additional semantics from
the author. Although these CiTO semantic enhancements would be easy to insert
18 http://www.ncbi.nlm.nih.gov/Taxonomy/
19 the standard nomenclature was only invented in 1753 and is still not used universally.
directly using the shortcode syntax, most authors will want to use their existing
reference manager which will not support this form of semantics; even if it does,
the author themselves gain little advantage from adding these semantics. There
are advantages for the reader, but in this case not for both author and reader.
As a result, we will probably add such support to kcite; but, if we are honest,
nd it unlikely that when acting as content authors, we will nd the time to add
this additional semantics.
      </p>
      <p>Second, our presentation of mathematics could be modi ed to automatically
generate MathML from any included TEX markup. The transformation could
be performed on the server, using MathJax; MathML would still be rendered
on the client to webfonts. This would mean that any embedded maths would
be discoverable because of the existence of MathML, which is a considerable
advantage. However, neither the reader nor the author gain any advantage from
doing this, while paying the cost of the slower load times and higher server load
that would result from running JavaScript on the server. More over, they would
pay this cost regardless of whether their content were actually being consumed
computationally. As the situation now stands, the computational user needs to
identify the insert of MathJax into the web page, and then transform the page
using this library, none of which is standard. This is clearly a serious compromise,
but we feel a necessary one.</p>
      <p>Our support for microarrays o ers the possibility of the most speci c and
increased level of semantics of all of our plugins. Knowledge about a species or
a microarray experimental design can be very prescisely represented. However,
almost by de nition, this form of knowledge is fairly niche and only likely to be
of relevance to a small community. However, we do note that the knowledgeblog
process based around commodity technology does o er a publishing process that
can be adapted, extended and specialised in this way relatively easily. Ultimately
the many small communities that make up the long-tail of scienti c publishing
adds up to one large one.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>Semantic publishing is a desirable goal, but goals need to be realistic and
achievable. to move towards semantic publishing in kblog, we have tried to put in place
an approach that gives bene t to readers, authors and computational
interpretation. As a result, at this stage, we have light semantic publishing, but with
small, but de nite bene ts for all.</p>
      <p>Semantics give meaning to entities. In kblog, we have sought bene t by
\saying" within the kblog environment that entity x is either maths, a citation or a
microarray data entity reference. This is su cient for the kblog infra-structure
to \know what to do" with the entity in question. Knowing that some
publishable entity is a \lump" of maths tells the infra-structure how to handle that
entity: the reader has bene t from it looking like maths; the author has bene t
by not having to do very much; and the infra-structure knows what to do. In
addition, this approach leaves in hooks for doing more later.</p>
      <p>It is not necessarily easy to nd compelling examples that give advantages
for all steps. Adding in CiTO attributes to citations, for instance, has obvious
advantages for the reader, but not the author. However, advantages may be
indirect; richer reader semantics may give more readers and thus more citations|
the thing authors appreciate as much as the act of publishing itself. It is, however,
di cult to imagine how such advantages can be conveyed to the author at the
point of writing. It is easy to see the advantages of semantic publishing for
readers, as a community we need to pay attention to advantages to the authors.
Without these \carrots", we will only have \sticks" and authors, particularly
technically skilled ones, are highly adept at working around sticks.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Shadbolt</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>The semantic web revisited</article-title>
          .
          <source>Intelligent Systems, IEEE</source>
          <volume>21</volume>
          (
          <issue>3</issue>
          ) (
          <year>2006</year>
          )
          <volume>96</volume>
          {
          <fpage>101</fpage>
          <lpage>1</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Shotton</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Semantic publishing: the coming revolution in scienti c journal publishing</article-title>
          .
          <source>Learned Publishing</source>
          <volume>22</volume>
          (
          <issue>2</issue>
          ) (
          <year>2009</year>
          )
          <volume>85</volume>
          {
          <fpage>94</fpage>
          <lpage>1</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Shotton</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Portwin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klyne</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miles</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Adventures in semantic publishing: exemplar semantic enhancements of a research article</article-title>
          .
          <source>PLoS computational biology 5(4)</source>
          (
          <year>2009</year>
          ) e1000361
          <fpage>1</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Linked data-the story so far</article-title>
          .
          <source>International Journal on Semantic Web and Information Systems (IJSWIS) 5</source>
          (
          <issue>3</issue>
          ) (
          <year>2009</year>
          )
          <volume>1</volume>
          {
          <fpage>22</fpage>
          <lpage>1</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Landport</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>The LATEX book</article-title>
          .
          <source>Adison wesley</source>
          , Reading, MA (
          <year>1984</year>
          )
          <fpage>1</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lord</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cockell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swan</surname>
            ,
            <given-names>D.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stevens</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>The ontogenesis knowledgeblog: Lightweight publishing about semantics, with lightweight semantic publishing</article-title>
          .
          <source>In: Semantic Web Technologies for Libraries and Readers</source>
          . (
          <year>2011</year>
          )
          <volume>2</volume>
          ,
          <fpage>3</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Wordpress: http://www.wordpress.
          <source>org. 2</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hull</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolstencroft</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stevens</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goble</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pocock</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oinn</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Taverna: a tool for building and running work ows of services</article-title>
          .
          <source>Nucleic Acids Res</source>
          <volume>34</volume>
          (
          <issue>Web Server issue</issue>
          ) (
          <year>Jul 2006</year>
          )
          <volume>729</volume>
          {
          <fpage>732</fpage>
          <lpage>2</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Knuth</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          :
          <article-title>The TEX Book. 3rd edition edn</article-title>
          .
          <source>Adison Wesley</source>
          , Reading, MA (
          <year>1986</year>
          )
          <fpage>4</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>10. WP Latex: http://wordpress.org/extend/plugins/wp-latex/. 4</mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>11. Mathjax-Latex: http://wordpress.org/extend/plugins/mathjax-latex/. 4</mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. Mathjax: http://www.mathjax.
          <source>org. 6</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Mendeley: http://www.mendeley.
          <source>org. 7</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. Zotero: http://www.zotero.
          <source>org. 7</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Brazma</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parkinson</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sarkans</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shojatalab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vilo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abeygunawardena</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holloway</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kapushesky</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kemmeren</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lara</surname>
            ,
            <given-names>G.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oezcimen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rocca-Serra</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sansone</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          :
          <article-title>ArrayExpress- public repository for microarray gene expression data at the EBI</article-title>
          .
          <source>Nucleic Acids Research</source>
          <volume>31</volume>
          (
          <issue>1</issue>
          ) (
          <year>2003</year>
          )
          <volume>68</volume>
          {
          <fpage>71</fpage>
          <lpage>9</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Shotton</surname>
          </string-name>
          , D.:
          <article-title>CiTO, the Citation Typing Ontology</article-title>
          .
          <source>Journal of Biomedical Semantics</source>
          <volume>1</volume>
          (
          <issue>Suppl 1</issue>
          ) (
          <year>2010</year>
          )
          <article-title>S6 10</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>