<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>LLMDapCat: An LLM-based Data Catalogue System for Data Sharing and Exploration</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shang Ferheng Karim</string-name>
          <email>kakashang96@msn.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aisha Kelifa</string-name>
          <email>aishakelifa@outlook.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amanda Marie Holsaeter Kjaer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shanshan Jiang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sondre Sørbø</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dumitru Roman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>SINTEF AS</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Norway</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data Catalogue System, Large Language Model, Retrieval Augmented Generation, Data Discovery</institution>
          ,
          <addr-line>Data Explo-</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>OsloMet - Oslo Metropolitan University</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <fpage>2</fpage>
      <lpage>6</lpage>
      <abstract>
        <p>Good data catalogues are essential for efective data sharing and discovery to cope with the rapid expansion of datasets and scientific literature available on the Web. In this paper, we present LLMDapCAT, an LLM-based metadata and data catalogue system that exploits Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) for eficient data profiling, sharing, and exploration. We demonstrate how the system serves both data providers and consumers: on the one hand, it allows providers to automatically generate standardized and semantically accurate metadata from scientific papers using an LLM and RAG-based pipeline, and to publish the metadata in the catalogue system; on the other hand, it enables consumers to browse available datasets and explore them in chat-like Q&amp;A sessions using an external LLM service. The system can be applied to curate custom domain-specific scientific databases that facilitate search, understanding, and exploration of domain-specific Demo: https://github.com/SINTEF-SE/LLMDapCat_Demo ∗Corresponding author.</p>
      </abstract>
      <kwd-group>
        <kwd>ration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        A key challenge in the current era of information overload is the limited efectiveness of traditional
keyword-based search over scientific literature and shared datasets. With the exponential growth of
available publications and data, platforms for data discovery must go beyond simple keyword matching.
There is an increasing need for data discovery on the Web that leverages semantic matching, which
requires high-quality, semantically rich metadata to accurately describe published datasets [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This is
inline with the Findability, Accessibility, Interoperability, and Reuse (FAIR) principles of data sharing.
      </p>
      <p>
        Manual annotation of metadata is labor-intensive, often inconsistent, and varies significantly across
domains. Automated approaches augmented with human-in-the-loop feedback mechanisms emerges as
a promising approach to address these limitations. Recent advancements in generative AI—particularly
the development of Large Language Models (LLMs), including both general-purpose models such as
ChatGPT, LLaMA, and Mistral, and domain-specific models such as BioGPT [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and BioMedLM [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]—ofer
robust technical capabilities to enhance the scalability, accuracy, and contextual relevance of metadata
annotation. Techniques such as Retrieval Augmented Generation (RAG) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] further contribute to this
progress by integrating external knowledge sources into the generation process, thereby improving
factual grounding and domain adaptation.
      </p>
      <p>Exploiting these recent advances, this paper presents the LLMDapCat web application, including a
Streamlit1-based interactive user interface and two LLM/RAG pipelines for metadata generation and
dataset exploration.</p>
      <p>CEUR</p>
      <p>ceur-ws.org</p>
      <p>Our main contributions are as follows:
• We introduce a data catalogue system designed for intuitive data sharing, featuring an accessible
user interface (UI) and automated metadata generation using LLMs. The generated metadata is
aligned with domain ontologies to ensure consistency and improve findability.
• We present a method to curate customized, domain-specific scientific databases that support
research exploration and analysis.</p>
    </sec>
    <sec id="sec-2">
      <title>2. LLMDapCat: A Web Application for FAIR Data Sharing</title>
      <p>LLMDapCat provides a ChatGPT-like interface for dataset profiling and discovery. While the system is
demonstrated using biomedical datasets, it can be extended to other scientific domains.</p>
      <sec id="sec-2-1">
        <title>2.1. System Architecture</title>
        <p>2https://www.ncbi.nlm.nih.gov/home/develop/api
3https://www.ebi.ac.uk/biostudies
4https://www.ebi.ac.uk/biostudies/arrayexpress
5https://europepmc.org/RestfulWebService</p>
        <p>LLMDapCat provides two main user interfaces:
• Provider View: Allows users to submit textual documents and automatically generate metadata
using the LLMDap pipeline.
• Consumer View: Enables users to search, explore, and query datasets using a ChatGPT-style</p>
        <p>Q&amp;A interface.</p>
        <p>Metadata Schema: One important input to the system is the metadata schema that includes metadata
ifelds defined according to domain ontologies to ensure semantic alignment and interoperability. The
LLMDap will extract metadata from the scientific papers based on the information contained in the
schema, consisting of metadata name, description, and value ranges.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Demonstration</title>
        <sec id="sec-2-2-1">
          <title>Provider View (Figure 2)</title>
          <p>This page allows users (data providers) to submit documents (e.g., research papers) for automated
LLM-based metadata extraction with the following steps:
1. Provide input paper: Users can upload a PDF/XML file or input a URL/PubMed ID.
2. Select schema: Users can use a default or custom JSON schema to guide metadata generation.
3. Process paper: Clicking “Process Input” button sends the paper and schema to the backend LLM
pipeline for metadata generation.
4. Review metadata: The generated metadata is displayed for user validation or edits.
5. Save results: When confirmed, the metadata is saved to the database and linked with the input
document.</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Dataset Browser (Figure 3)</title>
          <p>This page ofers dataset browsing and management functionality. It allows users (data providers or
consumers) to browse, search and select datasets processed and indexed in the catalogue. After a dataset
is selected for Q&amp;A (for data consumers), it will initiate the Consumer View page. In addition, this page
ofers an optional database utility function to allow users (data providers or system administrators) to
initiate index-update in addition to the automatic index-update associated with uploading of datasets
and their metadata.</p>
          <p>1. Browse datasets: A paginated view displays key metadata for available datasets.
2. Search datasets: A search bar allows keyword-based filtering.
3. Select for Q&amp;A: Users select datasets via checkboxes and initiate Q&amp;A by clicking on the “Ask</p>
          <p>Questions About Selected Datasets” button shown on the right image of Figure 3.
4. Update index: Users can press the “Rescan Directories &amp; Update Index” button (above the Browse
Datasets header in the left image of Figure 3) and start a background job to rescan and reindex
datasets.</p>
          <p>The configuration page (Figure 5) allows selection of LLM models, tuning of parameters (e.g.,
temperature, max tokens), and prompt template customization.</p>
          <p>Semantic Technologies: Metadata fields are aligned with established ontologies to ensure that LLM
outputs are both accurate and interoperable.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Conclusion and Future Work</title>
      <p>
        We have introduced LLMDapCat, a web-based application and LLM-backed pipelines for enabling FAIR
data sharing and exploration. Our approach uses RAG and LLMs to improve the quality and
trustworthiness of generated metadata and Q&amp;A interactions. Quantitative evaluation has been conducted on
the proposed system to validate the performance of the pipeline and showed systemic improvement in
the annotation task [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>In future work, we plan to integrate domain ontologies more tightly into the profiling process to
extend and refine metadata schemas. This will enhance semantic discovery, accuracy, and coverage of
domain-specific metadata. In addition, qualitative user evaluation is planned to validate the usefulness
of the system.</p>
      <p>Potential Impact: The proposed system facilitates semantic data discovery and exploration for
researchers. LLMDapCat can also be used to build customized scientific metadata catalogues in any
domain by tailoring the metadata schema and integrating with domain-specific APIs and ontologies.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>The work is funded through the projects UPCAST (HE 101093216), enRichMyData (HE 101070284), and
DataPACT (HE 101189771).</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>The author(s) used GPT-4 for grammar and spelling checks. The author(s) have reviewed and edited
the content and take full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. F.</given-names>
            <surname>Hagelien</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Natvig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Ontology-based semantic search for open government data</article-title>
          ,
          <source>in: 2019 IEEE 13th International Conference on Semantic Computing (ICSC)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>7</fpage>
          -
          <lpage>15</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICOSC.
          <year>2019</year>
          .
          <volume>8665522</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , H. Poon, T.-Y. Liu,
          <article-title>Biogpt: generative pre-trained transformer for biomedical text generation and mining</article-title>
          ,
          <source>Briefings in Bioinformatics</source>
          <volume>23</volume>
          (
          <year>2022</year>
          )
          <article-title>bbac409</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Bolton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Venigalla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yasunaga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Daneshjou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Frankle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Carbin</surname>
          </string-name>
          , et al.,
          <source>Biomedlm: A 2</source>
          .
          <article-title>7 b parameter language model trained on biomedical text</article-title>
          ,
          <source>arXiv preprint arXiv:2403.18421</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , W. tau Yih, T. Rocktäschel,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <article-title>Retrieval-augmented generation for knowledge-intensive nlp tasks</article-title>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/abs/
          <year>2005</year>
          .11401. arXiv:
          <year>2005</year>
          .11401.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sørbø</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tinn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. F.</given-names>
            <surname>Karim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roman</surname>
          </string-name>
          , Llmdap:
          <article-title>Llm-based data profiling and sharing</article-title>
          ,
          <source>in: VLDB 2025 Workshop: 3rd Data EConomy Workshop (DEC)</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Sayers</surname>
          </string-name>
          ,
          <article-title>A general introduction to the e-utilities, Entrez Programming Utilities Help [Internet]. Bethesda (MD): National Center for Biotechnology Information</article-title>
          (US) (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>U.</given-names>
            <surname>Sarkans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gostev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Athar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Behrangi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Melnichuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Minguet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Rada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Snow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tikhonov</surname>
          </string-name>
          , et al.,
          <article-title>The biostudies database-one stop shop for all data supporting a life sciences study</article-title>
          ,
          <source>Nucleic acids research</source>
          <volume>46</volume>
          (
          <year>2018</year>
          )
          <fpage>D1266</fpage>
          -
          <lpage>D1270</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Parkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kapushesky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shojatalab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Abeygunawardena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Coulson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Farne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Holloway</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kolesnykov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lilja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lukk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rayner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , E. William,
          <string-name>
            <given-names>U.</given-names>
            <surname>Sarkans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Brazma</surname>
          </string-name>
          ,
          <article-title>Arrayexpress-a public database of microarray experiments and gene expression profiles</article-title>
          ,
          <source>Nucleic Acids Research</source>
          <volume>35</volume>
          (
          <year>2006</year>
          )
          <fpage>D747</fpage>
          -
          <lpage>D750</lpage>
          . URL: https://doi.org/10. 1093/nar/gkl995. doi:
          <volume>10</volume>
          .1093/nar/gkl995. arXiv:https://academic.oup.com/nar/articlepdf/35/suppl_1/D747/3893619/gkl995.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rosonovski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Levchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bhatnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Chandrasekaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Faulk</surname>
          </string-name>
          , I. Hassan,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jefryes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. I.</given-names>
            <surname>Mubashar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nassar</surname>
          </string-name>
          , M. Jayaprabha Palanisamy,
          <string-name>
            <given-names>M.</given-names>
            <surname>Parkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Poluru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rogers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Selim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shafique</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ide-Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Stephenson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tirunagari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Venkatesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Harrison</surname>
          </string-name>
          , Europe pmc in
          <year>2023</year>
          ,
          <source>Nucleic Acids Research</source>
          <volume>52</volume>
          (
          <year>2023</year>
          )
          <fpage>D1668</fpage>
          -
          <lpage>D1676</lpage>
          . URL: https://doi.org/10.1093/ nar/gkad1085. doi:
          <volume>10</volume>
          .1093/nar/gkad1085. arXiv:https://academic.oup.com/nar/articlepdf/52/D1/D1668/55040834/gkad1085.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Tinn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sørbø</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Voutetakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Giounis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Pilalis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Papadodima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roman</surname>
          </string-name>
          , Pre-meta:
          <article-title>Priors-augmented retrieval for llm-based metadata generation</article-title>
          ,
          <source>Bioinformatics</source>
          (
          <year>2025</year>
          ).
          <article-title>Accepted for publication</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>