<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yasunori Yamamoto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Takatomo Fujisawa</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Web of Data</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Data curation</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Database Center for Life Science</institution>
          ,
          <addr-line>ROIS-DS, 178-4-4 Wakashiba, Kashiwa, Chiba 277-0871</addr-line>
          ,
          <country country="JP">JAPAN</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Institute of Genetics</institution>
          ,
          <addr-line>1111 Yata, Mishima, Shizuoka 411-8540</addr-line>
          ,
          <country country="JP">JAPAN</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>RDF data show their values the most when built in a distributed manner and linked to each other from several aspects with URIs as their keys. However, we have seen several URI mismatches across RDF datasets that should be identical such as the cases of using diferent prefixes and code systems. In this situation, we need to develop an infrastructure in which these URIs are treated identically by using an URI rewriting dictionary constructed to be tailored to each RDF dataset. Here, we show some examples of these synonymous URIs and propose an architecture to rewrite some URIs when retrieving RDF data from multiple SPARQL endpoints. As a result, users can obtain properties as to a consolidated URI, which otherwise get ones explicitly asserted as triples only.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Several works to represent huge and diverse life science data in the Resource Description
Framework (RDF) have emerged since the late 2000s, and the number of newly built RDF data
is increasing even now. Currently, 65 SPARQL endpoints are listed at the Umaka-Yummy Data1
where you can learn the status of each endpoint such as how stable it is, how fast it returns a
result, and so on. RDF performs at its maximum potential when each URI denotes one concept
and vice versa, since a URI is a global identifier. Multiple RDF datasets built in a distributed
manner can be easily joined if this is true. However, there are several URI discrepancies among
them. First of all, there are some typos and misprints within a dataset, such as the following:
LGOBE
https://researchmap.jp/yayamamo (Y. Yamamoto); https://researchmap.jp/takatomo (T. Fujisawa)</p>
      <p>© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings</p>
      <p>All of these URIs denote Homo sapiens. We consider this issue to be due to the nature of
a distributed way of building RDF datasets. Multiple groups and institutions are involved in
building. Therefore, in addition to calling community’s attention, we need to construct an
infrastructure to minimize these mismatches as much as possible with the help of machines.
Here, we propose an infrastructure where synonymous URIs are treated as identical. While there
are already related works such as sameAs3, Identifiers.org 4, and TogoID5, there is no attempt to
date that aims at providing consolidated results by rewriting URIs in the life science domain.</p>
    </sec>
    <sec id="sec-2">
      <title>2. URI consolidation</title>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <p>This work was supported under the Life Science Database Integration Project, NBDC of Japan
Science and Technology Agency.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>