<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>GO enrichment, a revamped synteny viewer and
more in the OMA Ecosystem, Nucleic Acids Research</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1186/s42826-020-00068-8</article-id>
      <title-group>
        <article-title>A Semantic Web-Based Infrastructure for Purpose- Driven Retrieval of Life Science Bioresources</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tatsuya Kushida</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daiki Usuda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Masanobu Yamagata</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Norio Kobayashi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shoichiro</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shindo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tatsuya Yamada</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuki Yamagata</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hiroshi Masuya</string-name>
          <email>hiroshi.masuya@riken.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>BioResource Research Center, RIKEN</institution>
          ,
          <addr-line>Koyadai 3-1-1, Tsukuba, Ibaraki</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>RIKEN Information R&amp;D and Strategy Headquarters</institution>
          ,
          <addr-line>2-1 Hirosawa, Wako, Saitama</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>52</volume>
      <issue>485</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>In the life sciences, the shared use of research materials is essential for ensuring experimental reproducibility. These materials are commonly referred to as biological resources. To support life science research, biological resource centers have been established worldwide as institutional platforms for providing such resources. One of the core functions of these centers is the dissemination of information about available materials. The RIKEN BioResource Research Center, one of the major bioresource centers in Japan, has been offering a knowledge-based search system for life scientists since 2018. This system leverages Semantic Web technologies to provide detailed biological characteristics of the resources it manages. By integrating bioresource data, public life science datasets, and ontologies through a SPARQL endpoint backend, the system enables researchers to explore relevant research materials from diverse scientific perspectives via a dedicated search interface. Furthermore, the use of Semantic Web technologies contributes to sustainable and scalable system operation. This report outlines the usefulness of the system based on several years of operation, as well as a new search system developed to address the shortcomings that had been identified.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;biological resource</kwd>
        <kwd>experimental material</kwd>
        <kwd>bioinformatics</kwd>
        <kwd>semantic web</kwd>
        <kwd>ontology 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Life science research heavily relies on the biological materials used in experiments. Due to
genetic variation across and within species, experimental outcomes are strongly influenced by
the specific materials employed. Ensuring reproducibility therefore requires the preservation,
maintenance, and shared use of these biological resources.</p>
      <p>To support this need, biological
resource centers have been established worldwide as repositories for research materials. These
centers play a vital role in providing information that helps researchers discover and select
suitable resources.</p>
      <p>A major challenge in information dissemination by these centers lies in addressing two kinds
of diversity: the diversity of the biological resources themselves, and the diversity of research
needs in the life sciences. Life science research spans basic biology to applied fields such as
medicine, environmental science, and energy. It employs a wide range of methodologies, from
macro-level studies of whole organisms to micro-level molecular analyses. Researchers examine
biological functions from multiple perspectives to uncover fundamental mechanisms and
develop new applications.</p>
      <p>
        To support such research, biological resources—including organisms, cells, and DNA—must
be accompanied by integrated information. This information needs to be presented in a way
that is accessible and meaningful to life scientists. The RIKEN BioResource Research Center
(RIKEN BRC) serves as a global repository of experimental materials such as mice, plants, cells,
DNA, and microorganisms [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1–4</xref>
        ]. Since its founding in 2001, RIKEN BRC has offered online
catalogs based on relational databases. In response to evolving research needs—particularly the
need to convey biological characteristics and to support cross-type resource searches—the
center launched a new system in 2018 that utilizes Resource Description Framework (RDF) and
Semantic Web technologies to achieve enhanced data integration [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Since its launch, this
system has been continuously operated, undergoing data updates and enhancements while also
improving and adding search software according to evolving needs.
      </p>
      <p>Basic Information
of Bioresources in
RIKEN BRC
s
n
o
i
t
a
c
i
l
p
p
A
b
e
W</p>
      <p>Integrated knowledge graphs</p>
    </sec>
    <sec id="sec-2">
      <title>2. Overview of the knowledge base and query system</title>
      <p>The bioresource search system at RIKEN BRC consists of multiple web-based search
applications supported by an RDF repository implemented using Virtuoso (OpenLink). These
web applications are integrated into the official RIKEN BRC website (https://web.brc.riken.jp),
providing seamless access to various search functionalities. The backend system and several
web applications are interconnected via APIs based on the SPARQLIST framework [6]. A public
SPARQL endpoint is also available, allowing users to directly query the RDF repository
(https://knowledge.brc.riken.jp/bioresource/sparql).</p>
      <p>To meet the diverse needs of life science research, the RDF repository integrates not only the
core metadata of RIKEN BRC’s bioresources but also public life science datasets and ontologies
[7–14] (Table 1). These data encompass both intrinsic genetic information and observable
phenotypic traits inferred from genotypes. The data model follows the structural principles
established by the OBO Foundry [15] and other relevant ontological frameworks [16, 17],
ensuring semantic consistency and extensibility. The system employs tailored query patterns
for each data source to enable practical integration of heterogeneous datasets.
Each dataset is managed as a named graph within the repository and is updated regularly. The
integrated graphs include both datasets originally published in RDF and those converted
inhouse. The use of RDF technologies promotes interoperability across datasets via shared URIs,
contributing to the long-term sustainability and cost-effectiveness of the knowledge base
infrastructure.</p>
      <p>The current knowledge base has grown significantly and now encompasses 350 named graphs
and a total of 6,899,700,302 triples in total, which are continuously expanded and updated. These
figures demonstrate the robustness and scalability of our system in integrating large-scale
knowledge within the life science domain.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Search Interfaces</title>
      <p>To enhance usability for bioresource users, we have developed two types of search interfaces.
The first is a simple search function embedded in the top page of the RIKEN BRC website
(https://web.brc.riken.jp). This interface provides a single search box that allows users to search
for bioresources using keywords such as gene names or human disease terms. A suggestion list,
generated by crawling the RDF repository, assists users in formulating their queries and
retrieving relevant results. Search results are presented in a unified format across five different
categories of bioresources (e.g., a sample of search result for “diabetes”).</p>
      <p>A major technical challenge in implementing this function was the poor SPARQL performance
when executing deep queries across multiple graphs. To overcome this, we created a simplified
RDF graph by crawling the deeper knowledge graph at the time of each data update. The search
interface operates on this optimized graph to ensure fast and responsive performance.
Specifically, queries involving complex inferences or numerous joins tended to exhibit long
response times.</p>
      <p>To address this technical challenge, we construct a "shortened knowledge graph" optimized for
simple keyword searches and basic filtering functionalities. It is created by crawling the original
deep knowledge graph and extracting/aggregating only entities that are linked to BRC’s
bioresources (e.g., genes, phenotypes, and diseases) and their primary associated properties (e.g.,
http://purl.obolibrary.org/obo/RO_0002200 (has phenotype)). Specifically, it is a reconstructed
graph centered on essential URIs along with their labels, IDs, and classification information.
This graph is specialized for particular search requirements, significantly reducing data volume
to ensure fast and responsive performance for the most frequently utilized search patterns, such
as keyword searches. This allows users to obtain search results quickly, with the option to
execute more complex detailed queries against the original deep knowledge graph if needed.
Over several years of operation this strategy has proven highly effective in improving system
responsiveness and scalability, offering a practical solution to performance challenges in
realworld applications of Semantic Web technologies.</p>
      <p>The utilization of bioresources spans a wide range of fields. Over several years of operation, the
need for a search function linked to more detailed conditions and research outcomes has been
identified. To address this need, in addition to the simple search, we have newly implemented
an advanced search interface (https://knowledge.brc.riken.jp/advanced/en/) for users who wish
to conduct more specific or complex queries. This advanced interface offers four filtering
options:
1. Ontology-based filtering, which uses hierarchical structures from multiple ontologies,
including Gene Ontology (GO), NCBI Taxonomy, Chemical Entities of Biological Interest
(ChEBI), and Mammalian Phenotype (MP) ontologies [10–13] (Tutorial 1).
2. Gene similarity-based filtering, which enables users to identify bioresources related to
genes based on sequence or evolutionary similarity, such as orthologs and paralogs
(Tutorial 2).
3. Literature-based filtering, which allows users to search for bioresources mentioned in
scientific publications (Tutorial 3).
4. Virus-related filtering, which supports searches based on associations with
infectionrelated processes, such as those observed in viral diseases like COVID-19 (Tutorial 4).
By combining these filters, users can generate tailored lists of bioresources that meet complex
research criteria (Tutorial 5).</p>
    </sec>
    <sec id="sec-4">
      <title>4. Future Challenges</title>
      <p>Over the past six years, we have developed and operated a public database that provides
information on bioresources—fundamental assets in life science research—based on RDF-based
Semantic Web technologies. Because this database aims to deliver information grounded in
domain-specific knowledge, the use of RDF has proven effective in reducing operational costs,
supporting sustainable system maintenance, and ensuring alignment with the FAIR principles.
One persistent challenge, however, is the limited performance of RDF repositories when
processing complex queries over deeply nested knowledge graphs. While we have mitigated
this issue by constructing a shortened knowledge graph to improve responsiveness, several
limitations remain apparent—particularly in search functionalities that are standard in
generalpurpose systems, such as partial keyword matching and relevance-based ranking. These
features are not natively supported by graph-based RDF technologies.</p>
      <p>To overcome these challenges, we plan to integrate a full-text search engine to enhance search
capabilities. Additionally, the application of large language models (LLMs) has recently
attracted attention as a means to further improve usability. In particular, the adoption of
Domain-Expert-Guided Large Language Models (DEG-LLMs)—a class of LLMs fine-tuned using
expert-curated knowledge—is anticipated to play a key role in future development.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This work is funded by the Management Expenses Grant for RIKEN BioResource Research
Center, MEXT (http://www.mext.go.jp/), and ROIS-DS-JOINT (045RP2024).</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>We used a large language model (LLM) to proofread and refine the English expression of this
paper. The content and core ideas of the paper were entirely developed by the authors.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Yokoyama</surname>
            <given-names>KK</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murata</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakade</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kishikawa</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ugai</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kimura</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kujime</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hirose</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Masuzaki</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yamasaki</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurihara</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Okubo</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakano</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kusa</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoshikawa</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Inabe</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ueno</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obata</surname>
            <given-names>Y.</given-names>
          </string-name>
          <article-title>Genetic materials at the gene engineering division, RIKEN BioResource Center</article-title>
          .
          <source>Exp Anim</source>
          .
          <year>2010</year>
          ;
          <volume>59</volume>
          (
          <issue>2</issue>
          ):
          <fpage>115</fpage>
          -
          <lpage>24</lpage>
          . doi:
          <volume>10</volume>
          .1538/expanim.59.115.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Yoshiki</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ike</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mekada</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kitaura</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakata</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hiraiwa</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mochida</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ijuin</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kadota</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murakami</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ogura</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abe</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moriwaki</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obata</surname>
            <given-names>Y.</given-names>
          </string-name>
          <article-title>The mouse resources at the RIKEN BioResource center</article-title>
          .
          <source>Exp Anim</source>
          .
          <year>2009</year>
          Apr;
          <volume>58</volume>
          (
          <issue>2</issue>
          ):
          <fpage>85</fpage>
          -
          <lpage>96</lpage>
          . doi:
          <volume>10</volume>
          .1538/expanim.58.85.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Nakamura</surname>
            <given-names>Y.</given-names>
          </string-name>
          <article-title>Bio-resource of human and animal-derived cell materials</article-title>
          .
          <source>Exp Anim</source>
          .
          <year>2010</year>
          ;
          <volume>59</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          . doi:
          <volume>10</volume>
          .1538/expanim.59.1.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Mizuno-Iijima</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakashiba</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ayabe</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakata</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ike</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hiraiwa</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mochida</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ogura</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Masuya</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kawamoto</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tamura</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obata</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shiroishi</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoshiki</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <article-title>Mouse resources at the RIKEN BioResource Research Center and the National BioResource Project core facility in Japan</article-title>
          .
          <source>Mamm Genome</source>
          .
          <year>2022</year>
          Mar;
          <volume>33</volume>
          (
          <issue>1</issue>
          ):
          <fpage>181</fpage>
          -
          <lpage>191</lpage>
          . doi:
          <volume>10</volume>
          .1007/s00335-021-09916-x.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Masuya</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Usuda</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakata</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuhara</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurihara</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Namiki</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iwase</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Takada</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanaka</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suzuki</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yamagata</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobayashi</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoshiki</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kushida</surname>
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Establishment</surname>
          </string-name>
          and
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>