<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yasunori Yamamoto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Takatomo Fujisawa</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Web of Data</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Data curation</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Database Center for Life Science</institution>
          ,
          <addr-line>178-4-4 Wakashiba, Kashiwa, Chiba 277-0871</addr-line>
          ,
          <country country="JP">JAPAN</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Institute of Genetics</institution>
          ,
          <addr-line>1111 Yata, Mishima, Shizuoka 411-8540</addr-line>
          ,
          <country country="JP">JAPAN</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>RDF data show their values the most when built in a distributed manner and linked to each other from several aspects with URIs as the keys. However, we have seen several URI mismatches that should be identical from case discrepancies to misuse of symbols such as '#' and '_'. Therefore, RDF curation is needed to make RDF data more linkable and valuable. Here, we propose an infrastructure for RDF data constructors to curate them.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The attempt to express huge and diverse life science data in Resource Description Framework
(RDF) has begun since the late 2000s, and the number of newly built RDF data is increasing
even now. Currently, 62 SPARQL endpoints are listed at the Umaka-Yummy Data[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in which
you can learn the status of each endpoint such as how stable it is, how fast it returns a result,
and so on. RDF demonstrates its maximum potential when each URI denotes one concept and
vice versa since a URI is a global identifier. Multiple RDF datasets built in a distributed manner
can be easily joined if this is true. However, there are several URI discrepancies among them. In
addition to the synonymous URI issue, of which we should take care, these include the following
examples.
      </p>
      <p>• h t t p : / / w w w . w 3 . o r g / 2 0 0 0 / 0 1 / r d f - s c h e m a # L a b e l</p>
      <p>We consider that these are due to the nature of a distributed way of building RDF datasets.
Multiple people and institutions are involved in building. Therefore, we need not only call
community’s attention, but also construct an infrastructure to minimize these discrepancies as
much as possible with the help of machines. Here, we propose such an infrastructure where
RDF data constructors can curate their data effectively and efficiently.
†
LGOBE
∗Corresponding author.</p>
      <p>These authors contributed equally.
https://researchmap.jp/yayamamo (Y. Yamamoto); https://researchmap.jp/takatomo (T. Fujisawa)</p>
      <p>© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
    </sec>
    <sec id="sec-2">
      <title>2. RDF data curation infrastructure</title>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <p>This work was supported under the Life Science Database Integration Project, NBDC of Japan
Science and Technology Agency.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yamamoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yamaguchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Splendiani</surname>
          </string-name>
          ,
          <article-title>Yummydata: providing high-quality open life science data</article-title>
          ,
          <source>Database (Oxford)</source>
          <year>2018</year>
          (
          <year>2018</year>
          ). doi:h t t p s : / / d o i .
          <source>o r g / 1 0 . 1 0 9</source>
          <article-title>3 / d a t a b a s e / b a y 0 2 2</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] Automatic extraction of shapes using shexer, Knowledge-Based Systems 238 (</article-title>
          <year>2022</year>
          )
          <article-title>107975</article-title>
          . doi:h t t p s : / / d o i .
          <source>o r g / 1 0 . 1 0 1 6 / j . k n o s y s . 2 0</source>
          <volume>2 1 . 1 0 7 9 7 5.</volume>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>