<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>From data portal to knowledge portal: Leveraging semantic technologies to support interdisciplinary studies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xiaogang Ma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patrick West</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>John Erickson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephan Zednik</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yu Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Han Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hao Zhong</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Fox</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Tetherless World Constellation, Rensselaer Polytechnic Institute</institution>
          ,
          <addr-line>Troy, NY, USA zhongh3</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Scientific research practices regularly adopt new technologies and platforms in an effort to increase information time liness, sharing and discoverability. There are many initiatives related to ope n data, open code, open access, open collections, composing the topic of Open Scie nce in academia. Being open has two leve ls of meanings. The first is to make the data, code, sample colle ctions and publications, etc. freely accessible online. The other is the annotation and connection between those resources to establish the provenance information for reproduc ible scientific research. In this paper we present our work on a web porta l for the Deep Carbon Observatory (DCO) community [1]. The DCO is a 10-year (2009-2019) initiative to intensify globa l attention and scie ntific effort in the burge oning fie ld of deep carbon science. Inspired by guiding questions such as “how much carbon does Earth conta in?” , “where is it?” and “what can deep carbon te ll us about origins?” more than 1000 sc ientists across the world are active ly partic ipating in the DCO community. The DCO web porta l is a research collaboration website developed to keep track of all researchers, organ izations, instruments, fie ld sites, and research outputs related to the DCO community. We intend for the DCO web porta l to be a knowle dge porta l - adopting state-of-the-art semantic technologies to support various stages of the scientific process within and beyond the DCO community. Ke ywords: Semantic Web; eScience; Knowledge Portal; Ontologies; Data Stewardship A model of the science network The context of our work is the Semantic Web, which is defined as an extension to the current Web by adding machine readable meanings and context to information on the Web. In this way, the Web is being transformed from a Web of Documents to a Web of Data [2]. Ontologies are an important way of capturing and representing machine readable meanings. An ontology is the formal specification of the shared conceptualization of a domain of study. In our work surrounding the DCO web portal, an initial part was the development of domain specific ontologies, and the integration of already existing ontologies. Our portal adapted the VIVO system as a platform for</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        metadata management. The VIVO system itself already uses a list of ontologies to
support academic information management. In our work we further extended the
VIVO system by developing a DCO ontology and importing a few other ontologies
such as the PROV Ontology [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for provenance documentation and DCAT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to re
present datasets and data catalogs. Table 1 lists the key ontologies and schemas used in
the web portal.
Citation Counting and Context
Characterization Ontology
http://purl.org/dc/terms/
http://vivoweb.org/ontology/core#
http://vivoweb.org/ontology/scie
ntificresearch#
http://www.w3.org/ns/dcat#
http://purl.org/ontology/bibo/
http://purl.org/spar/c4o/
Citation Typing Ontology
      </p>
      <p>http://purl.org/spar/c ito/
FRBR-Aligned Bibliogra phic Ontology
http://purl.org/spar/fabio/</p>
      <p>Pre fix
dc
dct
vivo
scires
dcat
bibo
c4o
cito
fabio
event
foaf
vcard
geo
skos
dco
prov
Event Ontology
Friend of a Friend
vCard Ontology
Geopolitical Ontology
DCO Ontology
PROV Ontology
http://purl.org/NET/c4dm/event.owl#
http://xmlns.com/foaf/0.1/
http://www.w3.org/2006/vcard/ns#
http://aims.fao.org/aos/geopolitica l.owl#
http://info.deepcarbon.net/schema#
http://www.w3.org/ns/prov#
Simple Knowledge Organization System</p>
      <p>http://www.w3.org/2004/02/skos/c ore#
It should be noted that those ontologies are not separated from each other. Instead,
they are integrated as a whole knowledge graph for representing the various agents,
entities and activities in the DCO scientific community. Ontology reuse and inter
mapping built the relationships among the components in this knowledge graph. For
example, bibo, c4o, cito and fabio ontologies represent the network of
bibliographic and citation information among various types of publications, foaf
represents the network of researchers and organizations, vivo and dco further extend the
inter-connections among those components and other objects such as research topics,
grants, projects, awards, and more. Provenance documentation leverages the W3C
recommendation prov, which represents a high level framework. Classes and
properties in other ontologies, such as dco, vivo and foaf, can be mapped as subclasses
and subproperties of corresponding classes and properties in prov. Moreover, the
knowledge graph can be extended according to real-world needs, especially dco,
which is an ontology created and curated by ourselves for the DCO community.</p>
    </sec>
    <sec id="sec-2">
      <title>Annotation and linking to create semantics</title>
      <p>With a knowledge graph as the core, the developed DCO web portal consists of four
major parts: Drupal, a content management system used as the main front-end web
portal where users can register, discover and retrieve various types of objects; Handle
System, which is used to assign a persistent and unique identifier to the objects,
known as a DCO-ID; VIVO, the main knowledge store; and CKAN, used for the
storage and archiving of datasets and other media. (Figure 1). Object registration,
discovery and retrieval can all be facilitated through the use of the DCO-ID, similar to
what the Digital Object Identifier (DOI) does for publications.</p>
      <p>The functionalities of the web portal enable an individual researcher to record almost
all the components in the life cycle of their research, from funding application,
instrument deployment, field work planning, data collection, data analysis, meeting
records, publication archival, to project reporting, and more. The portal also allows
researchers who share a common research interest to find, communicate with and
collaborate on research through virtual groups. All instance objects can be annotated
with a list of properties from the corresponding ontologies and can be linked directly
or indirectly to other objects. For example, a journal paper can be tagged with a few
keywords as its topics. One or more of those keywords may also be used to represent
the research interests of a researcher, who might find that paper of interest by searc
hing the keywords, and from the keywords the researcher may in turn find other
publications or researchers within the same domain. Such annotation and interconnections
among instance objects provide a more detailed network about the real world situation
and can expand our understanding of the research to an extent that cannot be reached
by only reading the conventional publications.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Identification and persistence of DCO resources</title>
      <p>The DCO-ID provides a persistent and unique identifier to all resources in the DCO
web portal. The DCO-ID is similar to the DOI for publications, but it extends the
scope to many more types of objects, including publications, people, organizations,
instruments, datasets, sample collections, keywords, conferences, etc. The enviro
nment of the Web may e volve in the future and the web addresses of the portal and the
various objects registered in it may change. With the DCO-ID, e ven after 10 or 100
years, one can still find the associated web address of that object and retrieve the i
nformation needed. In this way we can keep a persistent and stable legacy for the
activities and outputs of the DCO community. Fig. 2 shows a DCO publication records,
which has both a DOI and a DCO-ID (shown as a code in the „metadata‟ bar). The
DCO-ID allows users to retrieve more domain specific annotations from the metadata
of the publication in the portal. The records of community, authors, subject areas and
journal shown in Fig. 2 are all hyperlinks and they all have their own DCO-IDs.</p>
    </sec>
    <sec id="sec-4">
      <title>State-of-the-art data stewardship</title>
      <p>
        Data stewardship has a two-fold meaning: data management and data service.
Semantic technologies can leverage both parts. The above sections focus more on the data
management side. In our work we also made innovative progress on the data service
side. We are working together with other organizations to advance discovery and
usability of science data as well as other resources. One recent collaboration is with
the output of the Data Type Registry (DTR) working group [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] of the Rese arch Data
Alliance. Each DTR is a self-contained portal for data type registration and curation.
There are some common basic types, which are called „primitives‟ and will be regi
stered and managed by a central data type registry. This shows a two level hie rarchy of
a DTR, one is a list of primitives and the other is the specific data types defined wit
hin a DTR. This two-level hierarchy initiated our extension to the DCO ontology. In
our work, the basic data types are classes in the DCO ontology, and the spe cific data
types are at the instance level, i.e., they are all instances of a newly created class
“dco:DataType” and are part of our knowledge graph but accessible outside of DCO.
Fig. 3 shows the DCO dataset browser, in which the data type is used a facet that can
help users to find dataset of interest.
      </p>
      <p>
        Besides dataset curation, we also utilize the latest progress on data citation and
sample collection curation. For example, we created a class „dco:GeoSample‟ in the DCO
ontology which refers to the global initiative International GeoSample Number [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for
metadata used to annotate geological samples. We also use the metadata schema
DataCite [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for citation properties of registered datasets in the DCO web portal.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Concluding remarks</title>
      <p>Our aim for the DCO web portal is to create more than just a data portal, but a
knowledge portal. By using semantic technologies and leveraging state-of-the-art
methods in data stewardship we built a web portal for the DCO community to support
various aspects of their research. The information collected in the portal, from both
the DCO community and extramural data resources, is stored in ways that both
humans and computers can read and understand. A key feature of our portal, as enabled
by the Semantic Web, is the linkage among various registered objects and the flexible
ways to present them. With linked data we are able to create more and better
collaborations, find like-minded individuals working on common projects, add data that can
be useful to others, discover tools that can be used to visualize data in new ways, and
make it easier to discover, access, understand and use the data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>1. https://deepcarbon.net/</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lassila</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <year>2001</year>
          .
          <article-title>The Semantic Web</article-title>
          .
          <source>Scientific American</source>
          <volume>284</volume>
          (
          <issue>5</issue>
          ),
          <fpage>34</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Lebo</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sahoo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>PROV-O: The PROV Ontology</article-title>
          . Accessible at: http://www.w3.org/TR/prov-o/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Maali</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erickson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <year>2014</year>
          .
          <article-title>Data Catalog Vocabulary (DCAT)</article-title>
          . Accessible at: http://www.w3.org/TR/vocab-dcat/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. https://rd-alliance.
          <article-title>org/groups/data-type-registries-wg</article-title>
          .html
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>6. http://www.igsn.org/</mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>7. https://www.datacite.org/node</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>