<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Finding RDF data you need by Umaka Suite</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yasunori Yamamoto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Atsuko Yamaguchi</string-name>
          <email>atsuko@dbcls.rois.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems</institution>
          <addr-line>178-4-4, Wakashiba, Kashiwa, Chiba 277-0871</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Umaka suite consists of three tools for RDF data consumers to find the best RDF data of their interests. One is to search for SPARQL endpoints relevant to given keywords. Second is to find an endpoint that provides reliable data. Third is to learn a data structure of an endpoint. These are our solution proposal to issues of hindering further propagation of Linked Open Data in Life Sciences.</p>
      </abstract>
      <kwd-group>
        <kwd>RDF data discovery</kwd>
        <kwd>RDF data use</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Semantic Web technology has been adopted in Life Sciences since its
early stage, and lots of works have been done to ease the burden of
utilizing heterogeneous datasets in an integrated manner. Thanks to these
efforts, we can find the designated data easier more than ever by using
SPARQL queries over multiple SPARQL endpoints (we call them just
endpoints hereafter). Even though learning SPARQL may not be easy,
once done it you can search for any datasets through endpoints. The
issues are to find right endpoints that provide the designated data. In
addition, some endpoints have datasets that are similar to each other. In
this situation, we want to access the endpoint that is more reliable.</p>
      <p>Even if one can find the right endpoint, we want to learn the data
structure or schema quickly enough to issue SPARQL queries to retrieve the
designated data.</p>
      <p>Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
The Umaka Suite is our solution to these issues. It consists of three
tools: Umaka Search, Umaka-YummyData, and Umaka Viewer. We
briefly introduce them.</p>
      <p>Umaka Search enables us to search for right endpoints. We issue
keywords to it, which returns a list of endpoint URLs relevant to them.</p>
      <p>This service is currently under development, and we are releasing its
alpha version within the next year. A related service is Datao1, but the
source code is not open, hence we cannot tailor it to our purposes.</p>
      <p>Umaka-YummyData[1] is a service to find reliable endpoints and
facilitate mutual understandings between data providers and data
consumers. Umaka-YummyData introduces Umaka score to quantify a
reliability of an endpoint. The score is based on several aspects such as
update frequency, query processing speed, running history, ontology
usage, and so on. While we do not consider that Umaka score is the
only index to evaluate an endpoint, it can be a reference to choose one.</p>
      <p>In addition, we believe that it can be a trigger to begin communication
between data providers and data consumers.</p>
      <p>Umaka Viewer2 shows us a graphical representation of data structures
of a given RDF dataset. Data structure here means class hierarchies
along with a predicate list and statistic data such as the numbers of
triples. Umaka Viewer provides an interactive GUI, and we can zoom-in
and zoom-out to learn the class hierarchies. In addition, we can learn
which predicate links what classes.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Future Plans</title>
      <p>We intend to release Umaka Search, which covers as many endpoints in
life sciences as possible. As obtaining an entire RDF dataset that an
endpoint serves is inappropriate for the endpoint, we try to index the
RDF dataset that can be bulk downloadable.
1. Yasunori Yamamoto, Atsuko Yamaguchi, Andrea Splendiani. "YummyData: providing
high-quality openlife science data", Database, 2018</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>