1. Introduction

Yasunori Yamamoto

Takatomo Fujisawa

Web of Data

Data curation

0 Database Center for Life Science , 178-4-4 Wakashiba, Kashiwa, Chiba 277-0871 , JAPAN 1 National Institute of Genetics , 1111 Yata, Mishima, Shizuoka 411-8540 , JAPAN

2023

RDF data show their values the most when built in a distributed manner and linked to each other from several aspects with URIs as the keys. However, we have seen several URI mismatches that should be identical from case discrepancies to misuse of symbols such as '#' and '_'. Therefore, RDF curation is needed to make RDF data more linkable and valuable. Here, we propose an infrastructure for RDF data constructors to curate them.

1. Introduction

The attempt to express huge and diverse life science data in Resource Description Framework (RDF) has begun since the late 2000s, and the number of newly built RDF data is increasing even now. Currently, 62 SPARQL endpoints are listed at the Umaka-Yummy Data[ 1 ] in which you can learn the status of each endpoint such as how stable it is, how fast it returns a result, and so on. RDF demonstrates its maximum potential when each URI denotes one concept and vice versa since a URI is a global identifier. Multiple RDF datasets built in a distributed manner can be easily joined if this is true. However, there are several URI discrepancies among them. In addition to the synonymous URI issue, of which we should take care, these include the following examples.

• h t t p : / / w w w . w 3 . o r g / 2 0 0 0 / 0 1 / r d f - s c h e m a # L a b e l

We consider that these are due to the nature of a distributed way of building RDF datasets. Multiple people and institutions are involved in building. Therefore, we need not only call community’s attention, but also construct an infrastructure to minimize these discrepancies as much as possible with the help of machines. Here, we propose such an infrastructure where RDF data constructors can curate their data effectively and efficiently. † LGOBE ∗Corresponding author.

These authors contributed equally. https://researchmap.jp/yayamamo (Y. Yamamoto); https://researchmap.jp/takatomo (T. Fujisawa)

2. RDF data curation infrastructure Acknowledgments

This work was supported under the Life Science Database Integration Project, NBDC of Japan Science and Technology Agency.

[1]

Yamamoto ,

Yamaguchi ,

Splendiani , Yummydata: providing high-quality open life science data , Database (Oxford) 2018 ( 2018 ). doi:h t t p s : / / d o i . o r g / 1 0 . 1 0 9 3 / d a t a b a s e / b a y 0 2 2 .

[2] Automatic extraction of shapes using shexer, Knowledge-Based Systems 238 (

2022 ) 107975 . doi:h t t p s : / / d o i . o r g / 1 0 . 1 0 1 6 / j . k n o s y s . 2 0 2 1 . 1 0 7 9 7 5.