<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>May</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Exploring the Potential of Artificial Intelligence based Chatbots for Generating Federated SPARQL Queries over Bioinformatics Knowledge Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sourav Maiti</string-name>
          <email>souravmaiti@rcsi.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qurratal Ain Fatimah</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Syeda Mah-e-Fatima</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ali Hasnain</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>AI Chatbots, Federated queries, SPARQL, Healthcare and Life Science Datasets</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Pharmacy and Biomedical Sciences, Royal College of Surgeons in Ireland</institution>
          ,
          <addr-line>Dublin</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University Hospital Galway</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>26</volume>
      <issue>2024</issue>
      <abstract>
        <p>In this paper we investigates the eficacy of five AI bots - ChatGPT, Gemini, Copilot, Chatsonic, and YouChat - in formulating simple and complex federated SPARQL queries across Dbpedia, DrugBank, and KEGG databases. Through comparison with manually created queries, we unveil the bots' capabilities and limitations. Our findings highlight the potential of AI in data science and healthcare research, ofering insights into cross-domain query generation and its implications for interdisciplinary collaboration.</p>
      </abstract>
      <kwd-group>
        <kwd>Graphs</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        The deluge of data in biological databases ofers a diverse range of information in the healthcare
and life sciences domain. These databases provide opportunities for researchers, scientists
and working professionals to accelerate discoveries, develop new hypotheses and identify
novel patterns[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. On the other hand, these databases need implementation of sophisticated
storage and retrieval systems to retrieve information from these large databases. This becomes
a challenge for researchers and scientists[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Most biological databases published as RDF
Knowledge Graphs rely on complex query languages like SPARQL (SPARQL Protocol and
RDF Query Language)[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to retrieve information from databases. With no or limited technical
knowledge, researchers and domain users are unable to write accurate and reliable SPARQL
queries, which could become a bottleneck to exploit the full potential of these databases[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
SPARQL is a query language which enables users and provides a standardised way to query
information from databases[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Many biological databases leverage the RDF (Resource
Description Framework) data model, where RDF represents the information as interconnected
triples (subject, predicate, object) suitable for complex biological relationships like protein
functions, gene interactions [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The RDF data is made available via SPARQL endpoints and
SPARQL query language was specifically designed to query RDF data, allowing for eficient
https://www.rcsi.com/people/profile/alihasnain (A. Hasnain)
CEUR
Workshop
Proceedings
retrieval of information[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Since many databases use the RDF standard, SPARQL queries can be
used to access and query diferent databases helping researchers to integrate data from various
databases [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Similar to other querying languages, SPARQL also has a learning curve, and,
for non-technical biological researchers and scientists with limited experience, it could poses a
limitation to access data for their research while generating complex queries.
      </p>
      <p>
        As aforementioned, the Healthcare and Life Sciences Knowledge Graphs are generally
available across multiple locations and also sometimes in diferent formats. In order to search, query
and integrate the data coming from diferent sources of diferent formats, it sometimes become
challenging for working biologists without any technical knowledge or experience. Moreover,
creating federated queries that can retrieve information from multiple databases and combine
the information needs technical knowledge about RDF and SPARQL[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Furthermore, after the data has been extracted it requires further processing and without
coding skills it is a challenge for researchers to interpret and explain the results accurately
and consistently, potentially leading to misinterpretations[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Many of these databases put
more focus on developing new functionalities and increasing the data volume rather than
improving UI (User Interface). This limits the user base for these databases and are only usable
and interpret-able by researcher with understanding of RDF and SPARQL [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Significant work has already been done to facilitate domain users and working biologist to
either formulate complex queries using user interfaces [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] or with the help of visualisations
[
        <xref ref-type="bibr" rid="ref9">9, 10</xref>
        ]. Similarly, Hasnain et, al [11, 12] presented a service named SPORTAL, which is a system
that collects meta-data about the content of endpoints and collects them into a central catalogue
over which clients can search. This service focuses on the problem of helping clients to find
relevant SPARQL endpoints over the Web.
      </p>
      <p>
        Although these man-made services designed to help researcher to access, search and query
relevant RDF database but with the advent of generative AI based tools and chatbots, there is
a surge of accessing these services for nearly every walk of life. These chatbots provides an
opportunity to ask question in natural language, although this current generation of these bots
have limitations that could generate errors and bias in their results, researchers and working
professionals have started using these for diferent purposes. More recently Ana et al, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
explored the potential of Artificial Intelligence Chatbots (e.g., ChatGPT) for data exploration of
federated bioinformatics knowledge graphs. We believe that the performance of other chatbots
could also be explored for generating SPARQL queries to access data from SPARQL endpoints.
      </p>
      <sec id="sec-2-1">
        <title>1.1. Related Work</title>
        <p>Generative AI tools e.g, Open AI’s ChatGPT, Google Gemini use NLP (Natural Language
Processing)[13] and GPT (Generative Pre-trained Transformer)[14] which form the core architecture of
these tools. Generative AI tools appear to be in use by the researchers and scientists to apply
their biological knowledge combined with the power of AI and available biological databases
available to access and retrieve information from multiple databases without or with limited
knowledge of SPARQL querying or RDF structure.</p>
        <p>The complexity, syntax and steep learning curve of SPARQL poses a major problem, but with
the help of NLP[13], these problems can be partly addressed as it allows to write queries in
natural language and eliminates the need for having deep knowledge of querying language[15].
When creating complex queries where data needs to be gathered from multiple databases, NLP
could also helps in breaking down these complex biological questions into smaller and simpler
steps[15][13].</p>
        <p>
          GPT leverages the powerful Transformer architecture[16] to analyse and process entire
sentences simultaneously, unlike its predecessor Recurrent Neural Networks (RNN), which
processes information sequentially, so it could forgets and subsequently ignore information
learnt in the past to interpret the present information[14]. Transformers parallel processing
capabilities allows GPT to learn and understand complex patterns and relationships within
the sentence or paragraph[16] [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ][17]. GPT(s) can help researchers with limited technical
knowledge for creating SPARQL queries, generating explanations on them and any questions
related to SPARQL or databases etc [18]. In this case GPT can acts like a bridge between
natural language[15] and the complexities of querying databases. GPT can translate any natural
language query [14][18] into their corresponding SPARQL query, eliminating the need for
researchers to learn SPARQL complex syntaxes and the complete RDF structure[19]. Similarly
the results obtained from SPARQL queries can be complex and sometimes overwhelming for
nontechnical users where GPT bases system can also assist by analysing the results and explaining
complex biological information in simple language, highlighting key findings, presenting them
in a user-friendly format[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ][18]. In short, NLP and GPT combined together can complement
non-technical researchers to write SPARQL queries and accessing vast amounts of biological
databases[15].
        </p>
        <p>
          Since the introduction of Transformer AI model [16] by Google in 2017, the core architecture
behind ChatGPT and other GPTs[20][19], there has been a significant rise in the release of
Generative and Conversational AI tools[17], some of the most popular ones being ChatGPT[21],
Google Gemini[22], Microsoft Copilot, Chatsonic, Youchat. This paper provides a performance
review of the diferent GPTs (Generative pre-trained Transformer)[ 16] for generating simple
SPARQL queries to access data from single biological database as well as generating complex
federated SPARQL queries [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] to access, combine data from multiple biological databases by
providing prompts written in natural language ”English”.
        </p>
        <p>
          In this paper, we provide our findings while generating SPARQL queries through AI
chatbots to access healthcare and life science databases. For our experiments we considered five
chatbots namely ChatGPT, Gemini, Microsoft Copilot, Chatsonic and Youchat. For this study
we considered three databases namely DBpedia (covers cross domain including healthcare),
DrugBank and KEGG (Kyoto Encyclopedia of Genes and Genomes) which covers data on drugs
and genes respectively. Using aforementioned chatbots, we tried to generate both single source
queries (non-federated) as well as simple and complex federated queries presented earlier in
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. These queries were humanly generated citesaleem2018largerdfbench and the rationale
behind selecting these five queries is that they are reflective of the overall simple and complex
SPARQL queries set. These queries cover the wide range of SPARQL constructs (Select, Optional,
Filter etc), hence became the reason to be selected. Out of all the queries presented in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], we
selected S8, S11, S14 and C1 queries, whereas the complete list of queries is also available at:
https://shorturl.at/cfoAE. In essence we asked following four questions form respective five
chatbots while considering three databases and five SPARQL queries:
1. What is (Dbpedia, Drugbank, KEGG) database? Limit the answer to 60 words.
2. Does database (Dbpedia, Drugbank, KEGG) has a SPARQL endpoint, provide its URL?
3. Build the query by providing specific prompts
4. What is the result of this query (created in previous step)?
In this paper we are focusing to check the correctness of syntax of SPARQL query generated
through these bots and, it is out of scope to evaluate the results, correctness, completeness,
retrieval time of results or how the complexities of the queries can be simplified.
        </p>
        <p>Remaining of this paper is organized as follows: Section 2 presents the description generated
regarding diferent AI chatbots, prompts, queries and the improved queries provided to AI
chatbots. Section 3 provides simple and federated SPARQL queries generated by the AI chatbots
whereas Section 4 provides the results of the AI chatbots when asked to get the results of the
query. Section 5 provides a detailed discussion of the results generated by AI chatbots about
database descriptions, endpoints and SPARQL queries, and section 6 addresses the limitations
in using AI chatbots to generate SPARQL queries. Lastly, Section 7 provides the conclusion.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. Use Cases</title>
      <p>The sensitive nature of healthcare data creates fragmented, siloed and mostly private repositories
of data by design. Some of those repositories are publicly available as RDF graphs and can be
queried via SPARQL endpoints. This section provides information about the Chatbots used,
databases considered, SPARQL endpoints(if available) and original SPARQL queries as well as
the ”Improved Prompts for Chatbots”</p>
      <sec id="sec-3-1">
        <title>2.1. Chatbot Used:</title>
        <p>The table2.1 presents the list of chatbots, date accessed, version, developer and the year developed
along with the coverage statement of those bots.</p>
        <p>Chatbots
Date
Access
ChatGPT
11-03-2024
Gemini
11-03-2024
Copilot
14-03-2024
Chatsonic
12-03-2024
Youchat
12-03-2024</p>
        <sec id="sec-3-1-1">
          <title>Version</title>
          <p>GPT-3.5
1.0 Pro
GPT-4
model
Advanced
AI
YouChat2.0</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Year</title>
          <p>2020
N/A
2023
2021
2023</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>Developer</title>
        </sec>
        <sec id="sec-3-1-4">
          <title>Coverage</title>
        </sec>
        <sec id="sec-3-1-5">
          <title>OpenAI Covers a range of topics, including but not limited to gen</title>
          <p>eral knowledge, science, technology, literature, history etc.</p>
          <p>It is designed to engage in natural language conversations
and assist users with various inquiries and tasks.</p>
          <p>Google Capabilities are constantly evolving, but it can access and
AI process information from the real world through Google</p>
          <p>Search and keep the response consistent with results.</p>
          <p>Microsoft It is an advanced chatbot based on the powerful GPT-4
model and ofers a wide range of capabilities, operates in
multiple languages, and is available both as a free version
and a paid Copilot Pro version.</p>
          <p>Writesonic It can assist with a range of topics including but not limited
to general knowledge, writing assistance, research etc.</p>
          <p>You.com YouChat is an AI language model designed to provide
assistance, information, and insights on a wide range of topics.</p>
          <p>It is capable of engaging in detailed discussions and
answering factual questions based on available information</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Database Descriptions</title>
        <p>This section presents the description about the databases involved for our experiments. We
asked diferent chatbots to explain the database by themselves.
Dbpedia: When asking diferent chatbots ” What is DBpedia database? Limit the answer to 60
words”, we have the response presented in 2.2.</p>
        <p>DrugBank: When asking diferent chatbots ” What is Drugbank database? Limit the answer
to 60 words”, we have the response presented in 2.2.</p>
        <sec id="sec-3-2-1">
          <title>Database Descriptions</title>
          <p>DBpedia is a structured database extracted from Wikipedia, containing structured information about
topics covered in the encyclopedia. It transforms unstructured Wikipedia content into a machine-readable
format, enabling easier access and utilisation of Wikipedia’s knowledge. DBpedia organises information
into RDF triples, making it useful for semantic web applications and knowledge graphs.
DBpedia is essentially a giant knowledge base built from Wikipedia. It extracts information from
Wikipedia articles and organises it in a structured format, making it easily searchable and usable
by machines. Imagine turning the vast knowledge in Wikipedia into a database that computers can
understand. This allows you to ask complex questions about things mentioned on Wikipedia, like finding
cities in a specific state or artists from a particular era.</p>
          <p>DBpedia is a community-driven project that extracts structured content from various Wikipedia projects,
including Wikipedia. It creates an open knowledge graph (OKG) available on the web, allowing users to
semantically query relationships and properties of Wikipedia resources. DBpedia is a valuable resource
for researchers, industry professionals, and anyone interested in linked open data
DBpedia extracts structured content from Wikipedia and makes it available on the web. It provides a
large, multilingual knowledge base with information about notable people, places, organisations, and
things. DBpedia allows users to query this information and explore the relationships between diferent
entities. It’s a valuable resource for data integration, knowledge exploration, and semantic querying.
DBpedia is a large-scale, multilingual knowledge base extracted from Wikipedia, aiming to provide
structured content from the information created in the Wikipedia project. It allows users to find
answers to questions where the information is spread across multiple Wikipedia articles. The structured
information is made available on the web as Linked Open Data, adhering to Linked Data principles.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Database Description</title>
          <p>DrugBank is a comprehensive, freely accessible database containing information on drugs, targets,
and interactions. It provides chemical structures, pharmacological actions, and mechanisms of action
for drugs. Researchers and healthcare professionals use DrugBank for drug discovery, pharmacology
research and clinical applications as it provides drug properties and interactions.</p>
          <p>Drugbank is a free online library related to drugs and their targets. It is a giant encyclopedia specifically
for drugs that provides detailed information on chemical structure, how drugs interact with the body,
and even links to ongoing research. This comprehensive resource is used by researchers, doctors, and
even the general public to understand how medications work and their potential efects.</p>
          <p>DrugBank is a biomedical knowledgebase that provides structured drug data for research, discovery,
and software. It contains information on over 500,000 drugs and drug products, making it a valuable for
researchers and industry professionals interested in linked open data. DrugBank’s technology seeks out
relevant evidence, builds deep connections, and delivers usable, trustworthy drug knowledge
DrugBank is a comprehensive resource that combines drug data with target information and contains
extensive information about drugs, their mechanisms, interactions and targets. It is widely used by the
pharmaceutical industry, medicinal chemists and researchers for drug discovery and development. It is a
valuable tool for understanding the properties and efects of various pharmaceutical compounds.
DrugBank is a comprehensive, freely accessible online database containing information on drugs and
targets. It combines detailed drug data, including chemical, pharmacological, and pharmaceutical
information, with target information, such as sequence, structure, and pathway details. DrugBank is
widely used by the drug industry, medicinal chemists, pharmacists, physicians, students and the public
Kegg: When asking diferent chatbots ” What is KEGG database? Limit the answer to 60 words”,
we have the response presented in 2.2.
The KEGG database is a resource for understanding biological pathways, genomes, diseases, drugs and
chemical substances. It ofers curated data on molecular interactions and functions, aiding research in
bioinformatics, systems biology and drug discovery. KEGG’s pathway maps and analysis tools facilitate
the exploration of complex biological processes and their relationships.</p>
          <p>KEGG, short for Kyoto Encyclopedia of Genes and Genomes, is a database powerhouse for understanding
biological systems. Launched in 1995, it ofers a wealth of information on genes, proteins, pathways, and
diseases. Imagine KEGG as a central hub connecting the dots between genes, their functions, and how
they work together in cellular processes. This knowledge base is a valuable tool for researchers in areas
like genomics, drug development, and understanding the intricate workings of living organisms.
KEGG is a comprehensive biological knowledgebase that provides valuable insights into the molecular
interactions, pathways, diseases, drugs, and chemical substances. Researchers and educators utilise
KEGG for bioinformatics studies, including genomics, metagenomics, and metabolomics. It encompasses
data analysis, modeling, and simulation across various omics fields
KEGG is a comprehensive database that integrates genomic, chemical, and systemic functional
information. It ofers a wide array of data related to biological pathways, diseases, drugs, and chemical
substances. KEGG is widely utilised in bioinformatics and computational biology for studying molecular
interactions and cellular processes. It provides valuable insights into the relationships between genes,
proteins, and biochemical pathways.</p>
          <p>KEGG is a database resource for understanding high-level functions and utilities of the biological system,
such as the cell, the organism, and the biosphere, from genomic and molecular-level information. It
provides a computer model of the biological system, integrating molecular building blocks of genes and
proteins, chemical substances, and molecular wiring diagrams of interaction and reaction networks.</p>
          <p>KEGG also ofers a REST-style Application Programming Interface for accessing its database resources.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>2.3. SPARQL Endpoints</title>
        <p>Some databases provides endpoints to query their data. It is therefore important to ask directly
the respective chatbot: If the considered database has endpoint available or not. We asked the
following question trough diferent chatbots.</p>
        <p>DBpedia endpoint: When asked ”Does DBpedia database has a SPARQL endpoint? What is
its URL? ” following results were generated
1. ChatGPT: Yes, DBpedia provides a SPARQL endpoint for querying its database. The URL
for the DBpedia SPARQL endpoint is: http://dbpedia.org/sparql.
2. Gemini: Yes, DBpedia ofers a public SPARQL endpoint for querying its data. You can
access it directly through the following URL: https://dbpedia.org/snorql/
3. Copilot: DBpedia provides a public endpoint: http://dbpedia.org/sparql
4. Chatsonic: Yes, DBpedia has a SPARQL endpoint: https://dbpedia.org/sparql.
5. YouChat: Yes, the DBpedia has a SPARQL endpoint: http://dbpedia.org/sparql.</p>
        <p>Drugbank endpoint: When asked ”Does Drugbank database has a SPARQL endpoint? What
is its URL? ” following results were generated
1. ChatGPT: As of my last update in January 2022, DrugBank does not have a publicly
available SPARQL endpoint.</p>
        <p>KEGG endpoint: When asked ”Does KEGG database has a SPARQL endpoint? What is its
URL? ” following results were generated
1. ChatGPT: As of my last update in January 2022, KEGG does not provide a SPARQL
endpoint for querying its database
2. Gemini: KEGG doesn’t have its own SPARQL endpoint. For SPARQL access, consider:
Downloading KEGG data: Various formats exist (https://www.kegg.jp/kegg/download/).</p>
        <p>Third-party endpoints: Bio2RDF ofers SPARQL access to KEGG data.
3. Copilot: KEGG provides a public SPARQL endpoint.Remember to review the
documentation and usage notes before running any queries.
4. Chatsonic:The KEGG database ofers a SPARQL endpoint, and its URL is
https://www.genome.jp/sparql/linkdb. This endpoint allows users to execute SPARQL
queries to retrieve specific information from the LinkDB RDF, facilitating data access and
exploration within the KEGG database.</p>
        <p>5. YouChat: The URL for the KEGG SPARQL endpoint is not provided in the search results.</p>
      </sec>
      <sec id="sec-3-4">
        <title>2.4. SPARQL Queries</title>
        <p>
          We considered five diferent man-made SPARQL Queries with the following scheme: one
non-federated SPARQL query, three simple federated SPARQL queries and single complex
federated queries provided by Saleem et al[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. We take the Human English language description
of the queries and tried generating the queries on chatbots. Section2.5 provides the English
description of those original queries whereas section 2.6 provides the ”Improved Prompts written
for Chatbots” to see the results and performance of these chatbots.
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>2.5. Original Queries</title>
        <p>
          As mentioned before, out of all the queries presented in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], for this experiment we selected
S8, S11, S14 and C1 queries, whereas the complete list of queries is also available at: https:
//shorturl.at/cfoAE.
        </p>
        <p>1. Q1: Find all drugs from Drugbank (single source query)
2. Q2: Find all drugs from Drugbank and DBpedia with their melting points (simple
federated query)
3. Q3: Find all the equations of reactions related to drugs from category Cathartics
and their drug description (simple federated query)
4. Q4: Find drugs that afect humans &amp; mammals for those having a description of
their biotransformation, also return this description (simple federated query)
5. Q5: Find the equations of chemical reactions and reaction title related to drugs
with drug description and drug type ’smallMolecule’. Show only those whose
molecular weight average larger then 114 (complex federated query)</p>
      </sec>
      <sec id="sec-3-6">
        <title>2.6. Improved Prompts for Chatbots</title>
        <p>We rewrote the original English descriptions of these queries (provided in section2.5) as Improved
Prompts for Chatbots in order to exploit the full potential of these chatbot systems.</p>
        <p>We run the improved prompts single time in this work, the prompts were not revised or
improved after that, and the results were analysed based on the syntax of the query generated in
their first run. It was out of the scope to see the incremental improvement of prompts, checking
the correctness of results or how the complexities of the queries can be simplified.</p>
        <p>To improve the original English descriptions, we added more detail to the original prompts
by explicitly mentioning the i) query language to be used, ii) task the chatbot needs to perform,
iii) which databases it needs to get data from and iv) what specific data we are looking for. For
federated queries, we explicitly provide details on which features to collect from each data source
and how to combine the data from these data sources. Further detailed methodology is available
in section 5 of this paper. Prompt improvement or alternatively the prompt Engineering [23]
is an emerging field and there is a lot of ongoing work already in place. In summary- better
the prompt- better the response coming from these chatbots. P1 is the prompt query of Q1 and
similarly the other respective Qs i.e P &lt;==&gt; Q.</p>
        <p>1. P1: Write a single SPARQL query to retrieve the complete list of drugs from the Drugbank
database. Use DrugBank endpoint URL (http://drugbank.bio2rdf.org/sparql) to retrieve a
list of all distinct registered drugs in the Drugbank database.
2. P2: Write a single federated SPARQL query to retrieve all drugs from two biological
databases: DBpedia and Drugbank, along with their corresponding melting points. The
result from the query must be combined results from both databases.
3. P3: Generate a single federated SPARQL query to obtain drugs data from the category
”Cathartics”, their corresponding enzymes, chemical reactions, from DrugBank and KEGG
databases. Use the DrugBank database, to obtain drugs in the “Cathartics” category. For
each Cathartics drug identified, extract its description and KEGG compound ID. Use the
retrieved KEGG compound IDs from Drugbank to find corresponding enzyme entries in
the KEGG database and obtain the associated equations of reactions within KEGG.
4. P4: Generate a single federated SPARQL query to retrieve all drugs from DrugBank
biological database that afect “humans and mammals”. For each drug found, get its CAS
registry number. Using the CAS number from DrugBank, find the same drug in KEGG
biological database and obtain the mass information for each drug. Also, retrieve the
drug label, description of their bio-transformation from DrugBank, if they are available.
5. P5: Generate a single federated SPARQL query to retrieve distinct drugs and get
information like drug description, molecular weight average, compound, reaction title, and
chemical equation. Use the DrugBank and KEGG databases. From DrugBank, select drugs
with the drug type ”small molecule” and retrieve their descriptions and KEGG compound
IDs. Then, from KEGG, obtain information about enzymes, chemical reactions, reaction
titles and chemical equations associated with the selected compounds. Finally, include an
optional clause to filter drugs with a molecular weight average greater than 114.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Query Formulation through Chatbots</title>
      <p>It is worth noting that when asking Q2- Q5 the queries formulated by the chatbots were either
incomplete, inconsistent or simply wrong. This was the reason, more elaborative, descriptive
and direct prompt were generated (section 2.6). Detailed queries formulated asking diferent
prompts is presented in this section. In this paper we only presented the queries formulated
using three prompts out of five (section 2.6): P1 (single source query), P4 (simple federated
query) and P5 (complex federated query). These three selected queries are reflective of the
overall simple and complex queries, and constitute a good sample out of all the available queries
that also cover diferent SPARQL constructs.</p>
      <p>The queries(P1, P4, P5) generated by the AI chatbots are available at this Github
link:
https://github.com/Sourav-rcsi/Artificial-Intelligence-based-Chatbots-for-generatingFederated-SPARQL-queries.git</p>
    </sec>
    <sec id="sec-5">
      <title>4. Query Results</title>
      <p>The table4 shows the summary regarding the outcome of the AI chatbots for providing the
correct database descriptions, SPARQL endpoint URL and correct/ incorrect syntax of P1, P4
and P5 queries.</p>
      <p>Chatbots
ChatGPT</p>
      <sec id="sec-5-1">
        <title>When asked ”What is the result of this query?” P4</title>
        <p>!
×
!
!
!</p>
        <p>P5
×
×
×
×
×
1. ChatGPT: I’m unable to directly execute federated SPARQL queries. You can run the
provided SPARQL query on your local SPARQL endpoint or an online SPARQL query
service that supports federated queries to obtain the results.
2. Gemini: I can’t directly execute the SPARQL query to retrieve the results from the
Drugbank database. However, I can explain what the query retrieves and how to potentially
interpret the results.
3. Copilot: I apologise, but I am unable to execute the SPARQL query directly. However,
you can run this query using a SPARQL endpoint that supports federated queries. The
result would be a list of distinct drugs meeting the specified criteria, along with their
descriptions, molecular weights, compounds, reaction titles, and chemical equations. If
you have access to a SPARQL endpoint, you can input the query there to retrieve the
actual results.
4. Chatsonic: I’m unable to directly execute the query against the databases, but when
you run the provided federated SPARQL query, the result will be a table and will include
distinct drugs from DrugBank with the drug type ”small molecule”, along with their
descriptions and KEGG compound IDs. Additionally, it will provide information about
enzymes, chemical reactions, reaction titles, and chemical equations associated with the
selected compounds from the KEGG database.
5. YouChat: I’m sorry, but I couldn’t find any search results for the specific federated
SPARQL query you requested. If you have any other questions or need further assistance,
feel free to ask!
It is important to note that the results of the queries could not be generated by the chatbots
as currently they are not capable of running SPARQL queries at the endpoints. However,
in some cases they can provide some hint(s) to inform how will an example result of the
query look like.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Discussion</title>
      <sec id="sec-6-1">
        <title>5.1. Database and SPARQL endpoint</title>
        <p>DBpedia: When chatbots were asked ”what is the DBpedia database”, all generated the
correct and relevant description for DBpedia database, with good structure, in simple,
easy-tounderstand language except Copilot which has a slightly complex description. The bots were
also asked to limit the number of words to 60 in these questions and all the bots limited their
words to less than 60 except Gemini which generated more words. However, the description
generated by Gemini is much simpler to understand among others and goes beyond by giving
example how the database can be used. When asked if ”the DBpedia database has a SPARQL
endpoint”, all the bots unanimously said yes and generated the correct URL to connect to the
SPARQL endpoint.</p>
        <p>DrugBank: When chatbots were asked ”what is the DrugBank database”, all gave the correct
and relevant information about the database, with good but varied structure, in simple,
easyto-understand language. All bots except Gemini limited their answers to less than 60 words.
Gemini, along with ChatGPT and Copilot generated their description with examples on who
and how these databases can be used, while Youchat provided information which industry this
database is used. Interestingly, the structure for all the answers is slightly diferent. They all
provide the description first; ChatGPT, Copilot, Chatsonic then provides the applications next
and a general statement to finish of, while Gemini and Youchat provide the applications in
the last sentence. When bots were asked if the ”DrugBank database has a SPARQL endpoint
and to provide its URL”, ChatGPT could not provide the relevant information as its cut-of for
information is until January 2022. Gemini pointed to the correct resource from where you can
ifnd the DrugBank SPARQL endpoint, while, Youchat correctly mentions an endpoint exists, it
provides a non-existing URL. Microsoft Copilot pointed to the oficial DrugBank website and
provided the website URL, while, Chatsonic could not provide the URL to the endpoint or the
website, but mentions to refer the oficial DrugBank resource to find this information.</p>
        <p>KEGG: When chatbots were asked ”what is the KEGG database”, yet again all gave the
correct and relevant information about the database, with good but varied structure, in human
understandable language. For this prompt only ChatGPT and Gemini limited their answers to 60
words while Copilot, Chatsonic and Youchat generated more than 60 words. Gemini, yet again
generated a similar kind of descriptive answer with examples. Interestingly, again the structure
for all the answers is slightly diferent but similar for each bots. ChatGPT, Copilot, Chatsonic
provide the description first then the applications next a general statement to finish of, while
Gemini provides the applications in the last sentence. Youchat did not provide any application
in this case, it could be because the words were limited to 60. When chatbots were asked if
”the KEGG database has a SPARQL endpoint and to provide its URL”, ChatGPT generated the
same result as above stating it does not have up-to-date information. Gemini incorrectly states
that there is no SPARQL endpoint for the database and provides two URL’s, among which the
second is a non-existing URL, while the first URL brings you to download KEGG data. Copilot
correctly identifies a SPARQL link to the KEGG database exists, however provides link only
to the database homepage. Chatsonic correctly identifies a SPARQL endpoint does exist, and
provides a link to it, while Youchat cannot provide any URL.</p>
      </sec>
      <sec id="sec-6-2">
        <title>5.2. Queries</title>
        <p>In this paper, for our experiments we presents the results of only three prompts namely P1
(single source query), P4 (simple federated query) and P5 (complex federated query). This
section provides the comparison between the Human-made query and the chatbot generated
query regarding the correctness of syntax of the queries. It is indeed a limited comparison
which is entirely based on the findings on these five queries. The complexity of the queries
can further be increased and the results generated could be diferent that still remains an open
question to be investigated.</p>
        <p>P1 query: To ”Find all drugs from DrugBank database”, we created a prompt describing in detail
each step the SPARQL query needs to perform. We start by asking it to write a single query
as we noticed bots would generate several separate SPARQL queries to address one question
containing several steps. Next, we explicitly explain the that query needs to generated and also
provided the exact DrugBank SPARQL endpoint URL. We explain our query in more detail to
ifnd distinct drugs from the DrugBank database. Here, we ask it to generate only the query
(however, the explanations can also be limited to 50 words or 5 sentences or 1 paragraph, if
needed). The simple SPARQL query generated by ChatGPT, Microsoft Copilot and Youchat is
correct and directly executable at the DrugBank SPARQL endpoint. The SPARQL endpoints
provided by Chatsonic and Gemini are incorrect, they will not be directly executable as they
point to a non-existing URL. Interestingly, they were provided and asked to use the DrugBank
SPARQL endpoint URL in the prompt and both Chatsonic and Gemini have access to
up-todate information unlike ChatGPT (access to data up to January 2022), so an accurate SPARQL
endpoint result was expected from them. This was a relatively simple query and all the GPTs
generated correct SPARQL syntax to find all drugs from the Drugbank database.
P4 query: To ”find all drugs that afect humans and mammals and getting their description
of biotransformation if available”, the prompt was created to describe in detail each step the
SPARQL query needs to perform. We start by specifying the type of SPARQL query and specific
database that must be queried (which are federated query and DrugBank respectively in this
case). For federated queries, we found it important to mention the word ”single”, as the bots
would generate one or more queries against single question. We also mentioned to retrieve
all the drugs from this database that afects ”human and mammals”, which were in quotations
in order to emphasise the significance of those words. Next, we tell it to get the CAS registry
number for each of the drug found from the DrugBank database and find the same drug in KEGG
database using the CAS number to obtain its mass. Additionally, we ask it to collect the drug
label, description of their biotransformation from DrugBank database only if that information
is available. In the prompt, it was asked to generate the query only, not the descriptions/
explanations, as explanation of the generated queries can be quite lengthy and varied in details.
The query generated by ChatGPT uses meaningfully named variables to represent features, use
a simpler structure with prefixes and UNION clause to combine results from both the databases.
Gemini created a very detailed query with comments to understand the query but used useless
constructs like BIND for this query. We can clearly see that this is not the correct query and
would not bring the correct results. Copilot, Chatsonic and YouChat created nearly correct
query with some limitations e.g, in case of Copilot additional constructs like FILTER that makes
queries more complex and expensive to run.</p>
        <p>P5 query: In order to ”find the equations of chemical reactions and reaction title related to drugs
with drug description and drug type ”small molecule” with molecular weight average larger than
114”, we create a highly-detailed prompt to retrieve this information by querying DrugBank
and KEGG databases. We start by giving a high-level overview of our question to generate a
single federated SPARQL query that will get the unique drugs and specify the exact information
we are looking for. For federated queries, we found it important to mention the word ”single”,
as the bots would generate one or more queries for a single question. Next, the two databases
which needs to queried are specified and only the drugs with drug type ”small molecule” must
be retrieved along with their descriptions and KEGG compound ID from the DrugBank database.
From the KEGG database, information about enzymes, chemical reactions, reaction titles and
chemical equations associated with the selected compounds from DrugBank must be gathered.
Finally, these drugs must be filtered to only include drugs which have a molecular weight
average greater than 114. Here, we ask it to generate only the query rather than the description
about the generated query, as explanation of these queries can be varied in detail and length.
Clearly all the generated complex queries were not correct in the first iteration and in this
research we are not interested to improve the prompt for multiple iterations in order to get the
correct SPARQL query. Hence we stopped our experiment at this stage.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Limitations</title>
      <p>Generative AI has recently found its use in variety of applications[24], including medical
examinations, education, bioinformatics, plastic surgery[25]. It has outperformed several
previous architecture of AI in performance and achieved good results.</p>
      <p>In this article the comparison of generative AI bots is entirely based on the findings on these
ifve queries. The complexity of the queries can further be increased and the results generated
could be diferent that still remains an open question to be investigated.</p>
      <p>
        Since Generative AI is a new technology, it is prone to some limitations as follows:
1. Understanding SPARQL Basics: While generative AI has the capability to
generate SPARQL queries by converting natural language(English), a basic understanding of
SPARQL core concepts, keywords, syntax, will be beneficial for researchers to interpret
and understand the results generated by these SPARQL queries and to make modifications
to the queries generated.
2. Background knowledge: To generate the improved prompts it is essential to have:
i) some background knowledge of biomedical databases, ii) information regarding the
concepts need to be extracted and iii) knowledge regrading how can the results be
combined from federated queries. Moreover it is important to provide clear and concise
prompts to the AI chatbots. More detailed and concise the prompts are, higher the
probability of getting a comprehensive response.
3. Multiple Databases: Not all databases have the same underlying data model[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Federated SPARQL queries require retrieving and combining information from multiple
databases. It can be challenging to interpret those results by non-technical users if the
databases have diferent data models. This can also have an efect on the results and
potential errors in the query generated. Basic knowledge on the structure of databases is
required to provide relevant information to AI chatbots on how to combine the results
from multiple databases.
4. Accuracy of Prompts: The results generated by Generative AI are directly dependent
on the prompts/questions asked by the user[26]. The prompts must be clear, concise and
ifnely tuned for the research problem in question. They must also be highly detailed,
explaining what it must do for each step. Accuracy of the generated results by Generative
AI can be highly improved by improving the prompt/question.
5. Bias and Interpreting results: For complex federated queries, the data generated from
various databases can be at scale[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Having understanding of bias, background knowledge
on the problem and basic SPARQL skills or consultation along with domain experts can
enhance interpretation of results. Generative AI tools should be used iteratively by
starting with simpler queries/questions to understand the type and accuracy of result it
generates before moving to complex federated queries[26].
6. Generative AI tool: Depending on the question, the right Generative AI tool or a
combination of them should be used. If an up-to-date information is needed, Google,
Microsoft AI or ChatGPT-4 should be considered over ChatGPT-3.5 as it has information
until January 2022. For example, ChatGPT-3.5 will not have the current updates and
information on new databases. Each tool has its own pros and cons, a study of diferent
tools would be beneficial for researchers to select one that suits specific problem.
      </p>
    </sec>
    <sec id="sec-8">
      <title>7. Conclusion</title>
      <p>The future of data access is likely to involve federated open research data, driven by the
growing number of datasets and databases. Technological advancements are necessary to
bring federated data closer to users, particularly through improved user-facing services. Our
research demonstrates the competence of AI bots (ChatGPT, Gemini, Copilot, Chatsonic, and
YouChat) for generating syntax for SPARQL queries across diverse databases. By analysing their
outputs with human-authored queries, it is evident that these Chatbots still poses significant
limitations and we subsequently identify areas of strength and opportunities for improvement.
This study fosters synergy between data science and healthcare, facilitating more eficient query
formulation and advancing interdisciplinary research initiatives for the working biologists
who are unable to query biomedical databases due to technical nature of constructing SPARQL
queries to access RDF biological databases. In summary we have the following reflections:
• Publicly available chatbots have a potential in Find-ability and Reuse of databases, aiding
researchers in discovering relevant information about the databases.
• Conversational AI chatbots like ChatGPT, Gemini, and Copilot etc ofer high-level
database summaries, improving researchers’ understanding of database contents.
• Domain experts can utilise these chatbots for explaining SPARQL queries, while also
contributing to model improvement through feedback.
• AI chatbots can be trained and fine-tuned to generate better results suited for specific
needs.
• Caution is advised when using AI chatbots for data access due to potential hallucinations.</p>
      <p>Filtering methods based on confidence levels in language models may be explored, with
user validation remaining crucial for the results generated by theses chatbots in their
current forms.
[10] M. R. Kamdar, D. Zeginis, A. Hasnain, S. Decker, H. F. Deus, Reveald: A user-driven
domain-specific interactive search platform for biomedical research, Journal of biomedical
informatics 47 (2014) 112–130.
[11] A. Hasnain, Q. Mehmood, S. S. e Zainab, A. Hogan, Sportal: Searching for public sparql
endpoints., in: ISWC (Posters &amp; Demos), 2016.
[12] A. Hasnain, Q. Mehmood, S. S. e Zainab, A. Hogan, Sportal: profiling the content of public
sparql endpoints, in: Information Retrieval and Management: Concepts, Methodologies,
Tools, and Applications, IGI Global, 2018, pp. 368–401.
[13] K. Chowdhary, K. Chowdhary, Natural language processing, Fundamentals of artificial
intelligence (2020) 603–649.
[14] C. Wang, M. Li, A. J. Smola, Language models with transformers, arXiv preprint
arXiv:1904.09408 (2019).
[15] P. Zhang, M. N. Kamel Boulos, Generative ai in medicine and healthcare: Promises,
opportunities and challenges, Future Internet 15 (2023) 286.
[16] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I.
Polosukhin, Attention is all you need, Advances in neural information processing systems 30
(2017).
[17] B. Meskó, E. J. Topol, The imperative for regulatory oversight of large language models
(or generative ai) in healthcare, NPJ digital medicine 6 (2023) 120.
[18] K. Nova, Generative ai in healthcare: advancements in electronic health records, facilitating
medical languages, and personalized patient care, Journal of Advanced Analytics in
Healthcare Management 7 (2023) 115–131.
[19] J. Varghese, J. Chapiro, Chatgpt: The transformative influence of generative ai on science
and healthcare, Journal of Hepatology (2023).
[20] S. S. Biswas, Role of chat gpt in public health, Annals of biomedical engineering 51 (2023)
868–869.
[21] P. P. Ray, Chatgpt: A comprehensive review on background, applications, key challenges,
bias, ethics, limitations and future scope, Internet of Things and Cyber-Physical Systems
(2023).
[22] G. AI, An overview of bard: an early experiment with generative ai, https://ai.google/
static/documents/google-about-bard.pdf, 2023.
[23] B. Meskó, Prompt engineering as an important emerging skill for medical professionals:
tutorial, Journal of Medical Internet Research 25 (2023) e50638.
[24] Y. Chang, X. Wang, J. Wang, Y. Wu, L. Yang, K. Zhu, H. Chen, X. Yi, C. Wang, Y. Wang,
et al., A survey on evaluation of large language models, ACM Transactions on Intelligent
Systems and Technology (2023).
[25] J. Abi-Rafeh, H. H. Xu, R. Kazan, R. Tevlin, H. Furnas, Large language models and artificial
intelligence: a primer for plastic surgeons on the demonstrated and potential applications,
promises, and limitations of chatgpt, Aesthetic Surgery Journal 44 (2024) 329–343.
[26] S. Sai, A. Gaur, R. Sai, V. Chamola, M. Guizani, J. J. Rodrigues, Generative ai for
transformative healthcare: A comprehensive study of emerging models, applications, case studies
and limitations, IEEE Access (2024).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Saleem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hasnain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-C. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <article-title>Largerdfbench: A billion triples benchmark for sparql endpoint federation</article-title>
          ,
          <source>Journal of Web Semantics</source>
          <volume>48</volume>
          (
          <year>2018</year>
          )
          <fpage>85</fpage>
          -
          <lpage>125</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.-C.</given-names>
            <surname>Sima</surname>
          </string-name>
          ,
          <string-name>
            <surname>T. M. de Farias</surname>
          </string-name>
          ,
          <article-title>On the potential of artificial intelligence chatbots for data exploration of federated bioinformatics knowledge graphs</article-title>
          ,
          <source>arXiv preprint arXiv:2304.10427</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Saleem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hasnain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. Ermilov</given-names>
            , A.
            <surname>-C. Ngonga Ngomo</surname>
          </string-name>
          ,
          <article-title>A fine-grained evaluation of sparql endpoint federation systems</article-title>
          ,
          <source>Semantic Web</source>
          <volume>7</volume>
          (
          <year>2016</year>
          )
          <fpage>493</fpage>
          -
          <lpage>518</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saleem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-C. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <article-title>A survey of rdf stores &amp; sparql engines for querying knowledge graphs</article-title>
          ,
          <source>The VLDB Journal</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shokrollahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yarmohammadtoosky</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Nikahd</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Gu</surname>
          </string-name>
          ,
          <article-title>A comprehensive review of generative ai in healthcare</article-title>
          ,
          <source>arXiv preprint arXiv:2310.00795</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hasnain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Kamdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hasapis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zeginis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. N.</given-names>
            <surname>Warren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. F.</given-names>
            <surname>Deus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ntalaperas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tarabanis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mehdi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Decker</surname>
          </string-name>
          ,
          <article-title>Linked biomedical dataspace: lessons learned integrating data for drug discovery</article-title>
          ,
          <source>in: The Semantic Web-ISWC 2014: 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23</source>
          ,
          <year>2014</year>
          .
          <source>Proceedings, Part I 13</source>
          , Springer,
          <year>2014</year>
          , pp.
          <fpage>114</fpage>
          -
          <lpage>130</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hasnain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Mehmood</surname>
          </string-name>
          , S. Sana e Zainab,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saleem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Warren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zehra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Decker</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          Rebholz-Schuhmann,
          <article-title>Biofed: federated query processing over life sciences linked open data</article-title>
          ,
          <source>Journal of biomedical semantics 8</source>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hasnain</surname>
          </string-name>
          , S. S. e
          <string-name>
            <surname>Zainab</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Zehra</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Mehmood</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Saleem</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          Rebholz-Schuhmann,
          <article-title>Federated query formulation and processing through biofed</article-title>
          ., in: SeWeBMeDA@ ESWC,
          <year>2017</year>
          , pp.
          <fpage>16</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S. S. e</given-names>
            <surname>Zainab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saleem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Mehmood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zehra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Decker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hasnain</surname>
          </string-name>
          ,
          <article-title>Fedviz: A visual interface for sparql queries formulation and execution</article-title>
          .,
          <source>VOILA@ ISWC</source>
          <volume>1456</volume>
          (
          <year>2015</year>
          )
          <fpage>49</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>