<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Scienti c Names and Descriptions for Organisms on the Semantic Web</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nathan Wilson</string-name>
          <email>nwilson@mbl.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Han Wang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Deborah McGuinness</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Marine Biological Laborary</institution>
          ,
          <addr-line>7 MBL St., Woods Hole, MA 02556</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Rensselaer Polytechnic Institute</institution>
          ,
          <addr-line>110 8th Street Troy, NY 12180</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>An e cient process for creating precise, accurate, machine-interpretable morphological descriptions of groups of organisms is needed to more e ectively gather observations of the world's biodiversity. While morphological descriptions are required for the publication of modern scienti c names, it is common for these descriptions to get revised after the initial publication which can lead to data loss. A system for creating and naming machine-interpretable descriptions of groups of organisms, the Semantic Vernacular System, is proposed as a solution for creating such descriptions and managing their relationship to formal scienti c names while improving the collection of observational data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        as the `type specimen'. For all names, there is an assumption that all the organisms included are
part of a single evolutionary lineage. A formal description or `circumscription' is required for a
name to be validly published. The circumscription is intended to separate the described group
of organisms from those described by other names at the same level. The primacy of the type
specimen and the evolutionary lineage is demonstrated by the frequent revision of circumscriptions
when new evolutionary evidence is found with respect to the type specimen. For example, the
circumscription of Laetiporus sulphureus, the Sulphur Shelf Mushroom, was signi cantly reduced
when mating studies showed that the original circumscription included at least 5 distinct species
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The fundamental justi cation for this approach is to tie scienti c names to the evolutionary
relationships between organisms. The side e ect of these rules is that a given scienti c name may
have multiple circumscriptions and a given circumscription may apply to multiple scienti c names.
      </p>
      <p>While many biodiversity observation systems encourage photographs and notes to document an
observation, frequently a scienti c name is simply asserted with no explicit evidence. When just a
name is provided, there is an immediate loss of information for any name that has multiple
circumscriptions in current use. For example, if an occurrence of Laetiporus sulphureus is recorded, it is not
clear which existing circumscription was intended. This loss would be further compounded by any
further revisions unless the users of that data are careful to keep track of when new circumscriptions
are created relative to each observation.</p>
      <p>
        One possible approach would be for biodiversity observation systems to require that users provide
an explicit reference to the circumscription they used when making an identi cation. Sites such as
the Encyclopedia of Life (EOL) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and the Biodiversity Heritage Library [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] are beginning to provide
online resources that could be used for this purpose. However, they are still far from comprehensive
so nding and documenting the appropriate references remains a time consuming process which is
e ectively impossible for many observers. Even so this approach is problematic since it would not
require recording the observed features to help validate the identi cations in the future.
      </p>
      <p>A better approach from a data management standpoint would be to require a person making
an identi cation to be explicit about the features they based their identi cation on. However, given
the lack of even standardized terminology for many groups, this in e ect means the identi er would
have to write a detailed description for each identi cation. Validation would be better supported
with this approach, but would be very subject to the consistency and thoroughness of the identi er.
Both of these approaches would frequently impose a signi cant overhead burden on the taxonomic
experts.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Semantic Vernacular System</title>
      <p>
        The proposed alternative is to create the Semantic Vernacular System which enables authoring
named, machine-interpretable de nitions of groups of organisms that are then associated with sets
of scienti c names. The Web Ontology Language [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], is a natural t for such a de nitional system
and is already being actively used by signi cant parts of relevant biology communities including
members of the NSF Phenotype Ontology Research Collaboration Network.
      </p>
      <p>Just as with scienti c names, it would be valuable for the new system to be peer-reviewed, strictly
prevent the reuse of names, apply any agreed upon nomenclatural rules, and avoid the unintended
re-publishing of an existing description. The Semantic Vernacular Descriptions, or SVDs, envisioned
will be `born digital' in a freely available, global repository. This will allow all of these desirable
features to be applied consistently and automatically from the ground up. Once an SVD has been
approved through the peer-review process the association of the chosen name with that semantic
description will be considered a strict de nition that should never change just as the association
between a scienti c name and its type specimen should never change. This tight association means
that a particular SVD applies to any observed organism that matches that semantic description.
SVDs will allow users to apply precise sets of features by name to their observations. Biodiversity
observation systems using the system will allow users to automatically review and even explicitly
con rm the de ning features for any name they apply. While there will, of course, be human error
in the application of such names and even speci c features, the data recorded by the observer will
be explicit and will not degrade over time. For Laetiporus sulphureus this means separate names
for the historical circumscription as well as names for each morphologically distinct group whether
it includes multiple species, a single species, or a distinctive subset of one or more species.</p>
      <p>
        The proposed system will also support the registration and naming of descriptions that
correspond to common observational experiences. This will allow users to record what are in e ect
partial identi cations to groups of `look alikes' that may or may not include all the members of
an evolutionary lineage. Currently groups of species that are di cult to tell apart in the eld,
sometimes referred to as `sibling speices' [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] or more generally `cryptic species', are handled in a
variety of inconsistent ways including using higher-level taxa such as genera or families, modifying
species names by inserting `a .' or `cf.', adding `group' or `complex' at the end of the name, or
by informal names such `Comic Tern' or `Circus macrourus/cyaneus'. Ever improving resolution
of genetic information will increase the gap between formally recognized species and recognizable
groups of organisms [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The Semantic Vernacular System will provide a way to formally de ne
the recognizable groups while maintaining the many-to-many relationship between SVDs and
traditional scienti c names. These relationships will allow users to continue to use scienti c names as an
entry point into the system from which to nd SVDs to help e ciently describe their observations.
      </p>
      <p>
        Populating such a system with meaningful descriptions will not be simple. It must start with the
development of ontologies that capture and standardize the terminology needed to describe observed
features. This work has begun in some groups such as Teleost sh [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and the Hymenoptera [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
These e orts have created work ows for collaboratively developing such ontologies. We expect the
work of creating SVDs will start with groups of organisms which are not well handled by existing
nomenclatural codes such as polyphyletic groups and cryptic species complexes. Familiarity with
the emerging ontologies will in turn encourage users to describe and name SVDs for common,
larger monophyletic groups and eventually common species. The natural bias towards common
observational experiences will focus the system on the areas where it is expected to have the
greatest value. Complete coverage of all existing scienti c names is neither necessary nor expected.
      </p>
      <p>
        A demonstration system [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] created in collaboration with the Mushroom Observer and the
EOL is available at http://mushroomobserver.org/semantic_vernacular. Mushrooms are an
excellent example case for exploring these issues since many circumscriptions and observations
are based solely on the mushroom which is roughly equivalent to the ower of the larger fungal
organism. As a result there are many examples of polyphyletic groups and species complexes that
are di cult to identify to species based on easily observed characteristics. The EOL is a natural
aggregator for such descriptions as it already has support for polyphyletic, provisional and other
non-standard names that are provided to the system through its content providers. In addition, the
EOL is actively working to support `computable' data using semantic web technology.
      </p>
      <p>Finally, the proposed machine-interpretable de nitions naturally lead to a novel system for
computer-aided identi cation of observations. As users learn to describe their observations in the
same way that the descriptions are stored, it will be straight-forward for the system to indicate
what existing SVDs match the given features and the implied set of potential scienti c names.
This position paper describes the need for a new class of names tightly associated with semantic
descriptions of groups of organisms. We outline the creation of the Semantic Vernacular System for
managing these new names and descriptions. This system will enable more precise and accurate
observations of biodiversity with minimal additional overhead while encouraging the creation of
machine-interpretable descriptions with clear connections to traditional scienti c names.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>1. Artportalen, http://artportalen.se</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>2. Biodiversity Heritage Library, http://biodiversitylibrary.org</mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>3. Encyclopedia of Life, http://eol.org</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. International Code of Nomenclature of Bacteria: Bacteriological Code,
          <year>1990</year>
          Revision. ASM Press, Washington, DC, USA (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. International Code of Zoological Nomenclature.
          <source>The International Trust for Zoological Nomenclature</source>
          , London, UK, 4th edn. (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Burdsall</surname>
            ,
            <given-names>H.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bank</surname>
          </string-name>
          , M.T.:
          <article-title>The Genus Laetiporus in North America</article-title>
          .
          <source>Harvard Papers in Botany</source>
          <volume>6</volume>
          (
          <issue>1</issue>
          ),
          <volume>43</volume>
          {
          <fpage>55</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Dahdul</surname>
            ,
            <given-names>W.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lundberg</surname>
            ,
            <given-names>J.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Midford</surname>
            ,
            <given-names>P.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balho</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lapp</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vision</surname>
          </string-name>
          , T.J.,
          <string-name>
            <surname>Haendel</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wester</surname>
            <given-names>eld</given-names>
          </string-name>
          , M.,
          <string-name>
            <surname>Mabee</surname>
            ,
            <given-names>P.M.:</given-names>
          </string-name>
          <article-title>The Teleost Anatomy Ontology: Anatomical Representation for the Genomics Age</article-title>
          .
          <source>Systematic Biology</source>
          <volume>59</volume>
          ,
          <issue>369</issue>
          {
          <fpage>383</fpage>
          (
          <year>2010</year>
          ), doi:10.1093/sysbio/syq013
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Knapp</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McNeill</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turland</surname>
            ,
            <given-names>N.J.</given-names>
          </string-name>
          :
          <article-title>Changes to Publication Requirements Made at the XVIII International Botanical Congress in Melbourne - What Does e-Publication Mean for You? PhytoKeys 6(0),</article-title>
          <volume>5</volume>
          {
          <fpage>11</fpage>
          (
          <year>2011</year>
          ), doi:10.3897/phytokeys.6.1960
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Knowlton</surname>
          </string-name>
          , N.:
          <article-title>Sibling Species in the Sea</article-title>
          .
          <source>Annual Review of Ecology and Systematics</source>
          <volume>24</volume>
          ,
          <issue>189</issue>
          {
          <fpage>216</fpage>
          (
          <year>1993</year>
          ), doi:10.1146/annurev.es.
          <volume>24</volume>
          .110193.001201
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Mayr</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>The Bearing of the New Systematics on Genetical Problems. The Nature of Species</article-title>
          .
          <source>Advances in Genetics 2</source>
          ,
          <issue>205</issue>
          {
          <fpage>237</fpage>
          (
          <year>1948</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>van Harmelen</surname>
            ,
            <given-names>F.: OWL</given-names>
          </string-name>
          <string-name>
            <surname>Web Ontology Language Overview. World Wide Web Consortium (W3C) Recommendation</surname>
          </string-name>
          . (
          <year>2004</year>
          ), http://www.w3.org/TR/owl-features/
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Miko</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deans</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          :
          <article-title>Masner, a New Genus of Ceraphronidae (Hymenoptera: Ceraphronoidea) Described Using Controlled Vocabularies</article-title>
          .
          <source>ZooKeys 20</source>
          , 127{
          <fpage>153</fpage>
          (
          <year>2009</year>
          ), doi:10.3897/zookeys.20.119
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Patterson</surname>
            ,
            <given-names>D.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cooper</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kirk</surname>
            ,
            <given-names>P.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pyle</surname>
            ,
            <given-names>R.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Remsen</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          :
          <article-title>Names are Key to the Big New Biology</article-title>
          .
          <source>Trends in Ecology &amp; Evolution</source>
          <volume>25</volume>
          (
          <issue>12</issue>
          ),
          <volume>686</volume>
          {
          <fpage>691</fpage>
          (
          <year>2010</year>
          ), doi:10.1016/j.tree.
          <year>2010</year>
          .
          <volume>09</volume>
          .004
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Sato</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yumoto</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murakami</surname>
          </string-name>
          , N.:
          <article-title>Cryptic Species and Host Speci city in the Ectomycorrhizal Genus Strobilomyces (Strobilomycetaceae)</article-title>
          .
          <source>American Journal of Botany</source>
          <volume>94</volume>
          (
          <issue>10</issue>
          ),
          <volume>1630</volume>
          {
          <fpage>1641</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Sullivan</surname>
            ,
            <given-names>B.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wood</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ili</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonney</surname>
            ,
            <given-names>R.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fink</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelling</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>eBird: a Citizen-based Bird Observation Network in the Biological Sciences</article-title>
          .
          <source>Biological Conservation</source>
          <volume>142</volume>
          ,
          <issue>2282</issue>
          {
          <fpage>2292</fpage>
          (
          <year>2009</year>
          ), doi:10.1016/j.biocon.
          <year>2009</year>
          .
          <volume>05</volume>
          .006
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Ueda</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loarie</surname>
          </string-name>
          , S.: iNaturalist, http://inaturalist.org
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Wilson,
          <string-name>
            <surname>E.O.</surname>
          </string-name>
          :
          <article-title>The Future of Life</article-title>
          . Random House Digital, Inc. (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. Wilson, N.,
          <string-name>
            <surname>Dunn</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          : Application of Semantic Technology to De ne Names for Fungi.
          <source>Tech. rep., Tetherless World Constellation at Rensselaer Polytechnic Institute</source>
          (
          <year>2012</year>
          ), http://tw.rpi.edu/web/doc/ApplicationofSemanticTechnologytoDefineNamesforFungi
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. Wilson, N.,
          <string-name>
            <surname>Hollinger</surname>
          </string-name>
          , J.: Mushroom Observer, http://mushroomobserver.org
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Yoder</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miko</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seltmann</surname>
            ,
            <given-names>K.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bertone</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deans</surname>
            ,
            <given-names>A.R.:</given-names>
          </string-name>
          <article-title>A Gross Anatomy Ontology for Hymenoptera</article-title>
          .
          <source>PLoS ONE</source>
          <volume>5</volume>
          (
          <issue>12</issue>
          ),
          <year>e15991</year>
          (
          <year>2010</year>
          ), doi:10.1371/journal.pone.0015991
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>