<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>E. Cavalleri);</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>On the extraction of meaningful RNA interactions from Scientific Publications through LLMs and SPIRES</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Emanuele Cavalleri</string-name>
          <email>emanuele.cavalleri@unimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Mesiti</string-name>
          <email>marco.mesiti@unimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AnacletoLab - Dipartimento di Informatica, Università degli Studi di Milano</institution>
          ,
          <addr-line>Via Celoria 18, Milano</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>RNA-based technologies</institution>
          ,
          <addr-line>Knowledge Graphs, RNA-drug discovery, Large Language Models</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Workshop Proce dings</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Knowledge graphs (KGs) are useful tools to uniformly represent and integrate heterogeneous information about a domain of interest. However, they are inherently incomplete; therefore, new facts should be introduced by extracting them from structured and unstructured data sources. Starting from RNA-KG, the first KG tailored for representing diferent kinds of RNA molecules that we recently developed, in this paper we evaluate the use of SPIRES for extracting interactions among bio-entities involving RNA molecules from scientific papers guided by the RNA-KG schema. SPIRES is a general-purpose knowledge extraction system for mining information conforming to a specified schema. A customized prompt is generated and submitted to a Large Language Model (LLM) along with a text to extract a set of RDF triples adhering to the schema constraints. The experiments show a high accuracy in extracting interactions from the scientific literature.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The “RNA world” represents a novel frontier for the study
of fundamental biological processes and human diseases
and is paving the way for the development of new drugs
though scientific data about coding and non-coding RNA
molecules are continuously produced and made available
from public repositories, they are scattered across
diferent databases and in the scientific literature. A
centralized, uniform, and semantically consistent representation
of the knowledge on RNA is still lacking. We have
recently constructed RNA-KG [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a knowledge graph
integrating biological knowledge about RNA molecules with
their functional relationships with genes, proteins, and
chemicals and biomedical ontological concepts. RNA-KG
includes around 600K nodes and 9M RDF triples
representing reliable interactions involving RNA molecules
and related biomedical concepts extracted from more
than 50 public data sources according to 11 bio-ontologies.
      </p>
      <sec id="sec-1-1">
        <title>RNA-KG is coupled with a meta-graph representing all the possible interactions involving RNA molecules.</title>
        <p>
          nEvelop-O
(M. Mesiti)
CEUR
sive Extraction of Semantics) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] is a recently proposed
approach to information extraction that exploits Large
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>Language Models (LLMs) [3] to identify instances of a knowledge schema expressed in terms of LinkML [4] starting from plain texts. By identifying and extracting</title>
        <p>Published in the Proceedings of the Workshops of the EDBT/ICDT 2024</p>
        <p>CEUR</p>
        <p>Workshop Proceedings (CEUR-WS.org)
relevant information from an input text, it adopts
zeroshot learning to identify and extract relevant entities
and relationships among them, which are then
normalized and grounded through ontologies and vocabularies.</p>
      </sec>
      <sec id="sec-1-3">
        <title>SPIRES is a general-purpose approach that can be used</title>
        <p>
          cific training/tuning on the considered domain. SPIRES
adopts an engineering approach for creating prompts
for interacting with an LLM (like GPT [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], Llama 2 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ],
        </p>
      </sec>
      <sec id="sec-1-4">
        <title>Mistral [7], and Zephyr [8]) to improve the quality of</title>
        <p>the generated responses [9]. In this way, technical
challenges for generative AI (e.g., constructing
comprehensive real-world knowledge and improving the accuracy
of automated responses) can be addressed.</p>
      </sec>
      <sec id="sec-1-5">
        <title>In this paper, we discuss the initial experimental results</title>
        <p>that we obtained by applying SPIRES in the extraction of
interactions among bio-entities involving RNA molecules
in the context of the PNRR project “Gene Therapy and</p>
      </sec>
      <sec id="sec-1-6">
        <title>Drugs based on RNA Technology”. The purpose of the experiments is to show the level of accuracy of the system in extracting interactions from the scientific literature and investigate the possibility of combining RNA-KG with</title>
      </sec>
      <sec id="sec-1-7">
        <title>RNA molecules is particularly challenging for two rea</title>
        <p>sons. First, a well-recognized ontology for characterizing
non-coding RNA molecules is still lacking, and then
different identifiers for representing the same bio-entity are
adopted. Even if a more systematic evaluation should be
conducted, the initial results are very encouraging.</p>
      </sec>
      <sec id="sec-1-8">
        <title>The paper is structured as follows. Section 2 describes</title>
        <p>the SPIRES approach and related approaches that
integrate LLMs with knowledge data. Section 3 presents the</p>
      </sec>
      <sec id="sec-1-9">
        <title>LinkML schema that we have developed for interacting with SPIRES. Section 4 describes the experimental results, while Section 5 reports concluding remarks.</title>
        <p>CEUR
ceur-ws.org
Protein:
attributes:
label:
description:</p>
        <p>the name of the protein
annotations:
annotators: sqlite:obo:pr
Schema</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. SPIRES and Related Work</title>
      <sec id="sec-2-1">
        <title>Furthermore, in case relationships are identified, SPIRES</title>
        <p>selectively retains only those aligned with the predefined
The population of a KG by extracting triples from un- schema that can be grounded to the Relations Ontology
structured texts is an interesting research activity and the (RO [14]). By exploiting standard identification schemes
advent of LLMs has boosted the interpretation of highly adopted by the reference bio-ontologies, the system
guartechnical languages as shown on question-answering antees the generation of triples that can be easily
intebenchmarks [10]. However, these techniques have shown grated into a biomedical KG.
diferent limitations, such as generating incorrect state- SPIRES thus creates and refines prompts to maximize
ments due to hallucinations [11] and insensitivity to nega- the efectiveness of LLMs by exploiting domain
knowltions [12], that cannot be tolerated in sensitive domains edge encapsulated through the description of the classes
like precision medicine. SPIRES adopts: ) the knowledge and relationships that we wish to include in the KG.
schema of a specific domain for the generation of prompts As outlined in [9], the explicit and structured
informafor reducing these drawbacks; and ) bio-ontologies for tion contained in KGs can also be used for improving the
enhancing the quality of the produced information. knowledge awareness of LLMs. KGs have been used: )</p>
        <p>
          Figure 1 outlines the SPIRES workflow. SPIRES re- in the training of the LLM [15, 16]; ) during the
inferquires the specification of the knowledge schema ex- ence stage for making available to the LLMs the latest
pressed in LinkML [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] to guide the system in the ex- knowledge without retraining [17]; ) to improve the
traction of knowledge. A LinkML schema contains the interpretability of LLMs by explaining the facts [18] and
classes of entities and relationships among them within by enhancing the reasoning process of LLMs [19]. One of
the specified domain. Classes can also include attributes the main disadvantages of solution ) is that the
enhance(e.g., name, type, and list of synonyms) to enrich en- ment of the knowledge contained in the KG requires a
tity description. The LinkML schema is automatically retraining of the model which is a time (and money)
conprocessed to generate a list of prompts through which suming activity. For this reason, approaches of solution
SPIRES interacts with a LLM (e.g., GPT3, GPT4, Llama 2, ) are gaining momentum because they allow the
sepaMistral, and Zephyr). Each prompt of the list is submitted ration of the text space and the knowledge space. In this
to the LLM for collecting information that is exploited case, knowledge is injected at the time of inference.
for completing the following prompt by eventually
considering the bio-ontologies (e.g., for changing a protein
symbol with the corresponding identifier in an ontology). 3. The SPIRES Schema for RNA-KG
This refinement recursive process improves the quality
of the information gathered through the LLM.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>For the creation of the schema needed for the application</title>
        <p>of SPIRES, we considered the RNA-KG meta-graph [20]
Example 1. Suppose we wish to extract proteins from a that represents all the kinds of relationships involving
text. A LinkML expression can be generated for describing RNA molecules in the considered data sources. Starting
the class Protein with its properties and the adopted iden- from it, a UML class diagram was developed that
fortification scheme (See Figure 1). A prompt is then generated mally describes the schema of the considered domain
for this class and used for extracting proteins. However, and can be used for identifying meaningful relationships
the result obtained by ChatGPT alone (in this case COX20) in the considered domain. Figure 2 shows an excerpt of
is not compliant with the Protein class structure. There- the generated UML class diagram that consists of four
fore, SPIRES exploits bio-ontologies (e.g. PRotein Ontology biological and biomedical classes (miRNA, gene, protein,
– PRO [13]) to obtain an adequate result. and disease) with six kinds of RO relationships.
regulates activity of
n
n</p>
        <p>miRNA
+ id: String
+ description: String
+ sequence: String
+ family name: String
n
n
n
regulates activity of
interacts with</p>
        <p>gene
n + id: Integer</p>
        <p>+ type: String
n + symbol: String
disease</p>
        <p>n
+ id: String
n
+ description: String
+ synonym: String list
1
n
n
n
n
gene product of
has gene product
regulates activity of
causally related to</p>
        <p>protein
+ id: String
n
1 + description: String
n + synonym: String list
+ ortholog: String list
+ sequence: String
n
causes or contributes to condition
causes or contributes to condition</p>
        <p>miRNA molecules are small non-coding RNAs that play miRNAs, “mmu-” prefix murine miRNAs, mature miRNA
a central role in gene expression via interference path- are designated with “miR-” substring whilst “mir” refers
ways and their misregulation is associated with several to the stem-loop primary transcript). Labels can be then
diseases [21]. miRNA molecules can generically interacts easily translated into miRBase accession identifiers using
with genes but also more precisely regulate the acti- a look-up table.
vity of a gene when a miRNA molecule blocks the
translation of a gene or promotes the degradation of gene’s Example 2. A LinkML class used to specify causes or
product. Moreover, miRNA molecules can regulate contributes to condition relationships between
prothe activity of other miRNAs because they form base- teins and diseases is reported in Listing 1. In the expression,
pairing interactions with complementary miRNA mole- we have to specify the need to extract triples representing
cules according to [22, 23]. The schema also contains relationships between proteins and disease in which the
the relationships involving genes and proteins. Specif- only admitted predicate is causes or contributes to
ically, the has gene product relation and its inverse condition (RO:0003302). In the expression, samples of
gene product of are used for representing that difer- the kinds of relationships that we wish to extract are
reent proteins are translated from the same gene (i.e. iso- ported. The prompt generated for this class relies on the
forms); while the regulates activity of is used for prompts generated for the classes protein and disease
representing that a subclass of the proteins (transcrip- and used for the identification of these bio-entities from
tion factors) regulates the activity of genes, promoting the scientific literature. Figure 3 shows an output obtained
or down-regulating their activity acting as enhancers or by using SPIRES and the corresponding result obtained by
repressors. Both proteins and miRNAs are connected to the simple application of ChatGPT. In the SPIRES’ output,
the disease class by the causes or contributes to the extracted interactions are already represented as triples
condition relation. The diagram also contains the main that exploit the required identification scheme. Therefore,
properties that can be associated with these bio-entities checking their presence in RNA-KG and, in case of new
(e.g., nucleotide/amino acid sequences, descriptions of triples, their integration is facilitated.
molecules/diseases, synonyms).</p>
        <p>The proposed UML class diagram was translated into a
LinkML schema. Genes are annotated using HGNC [24] 4. Experimental results
IDs. This choice is motivated by the stability of the HGNC
IDs even if a gene name or symbol changes. Proteins In this section we discuss the experiments that we
carare grounded to the PRotein Ontology (PRO) while dis- ried out to evaluate SPIRES for extracting interactions
eases are grounded to both the Monarch Disease On- involving RNA molecules. Moreover, we compare SPIRES
tology (Mondo [25]) and the Human Phenotype Ontol- with ChatGPT (ver. GPT-3.5-turbo), which is the LLM
ogy (HPO [26]). miRNAs were left with no semantic an- internally integrated in SPIRES, and with Llama 2 (ver.
notation since miRNA labels (e.g., hsa-let-7b-5p) and llama-2-70b-chat), another well-known and used LLM.
miRBase [27] accession identifiers ( MIMAT0000063) are
CURIE prefixes not included in default SPIRES
annotators. We can manually retrieve miRNA molecules from 4.1. Corpus of Annotated Documents
relationships extracted from SPIRES since their labels fol- To evaluate the extraction of relations aligned with the
low a pattern (for instance, “hsa-” prefix indicates human meta-graph depicted in Figure 2, we manually selected a
Listing 1: LinkML template for protein-disease interaction.</p>
        <p>False Negative (FN) according to the manually tagged
paragraphs. Table 1 reports the obtained results for the
considered interactions ordered according to the F-score
measure. The obtained results indicate a consistent trend
where recall tends to be lower than precision due to the
prevalence of false negatives over false positives. We
think this behavior is due to the dificulty in accurately
excorpus of 60 scientific articles gathered from PubMed, Re- tracting precise relationships from text, especially in
dissearchGate, and Google Scholar by specifying keyword- tinguishing specific types of relationships. Furthermore,
based queries like: “disease”, “comorbidity”, “protein”, we observe that disease-disease and miRNA-disease
in“miRNA”, “miRNA regulation”, “gene”. From these doc- teractions present a high F-score. These kinds of
interuments, we identified paragraphs containing useful in- actions are widely studied in the literature and thus a
formation to be extracted (e.g., abstract, discussion, or higher number of publications are available with respect
specific subsections within the domain of interest). In to other interactions (like miRNA-miRNA interactions).
the identification of the paragraphs we have taken into Consequently, the abundance of this kind of relationships
account the following guidelines: ) the paragraph should contributes to a higher true positive rate. Conversely, the
contain diferent kinds of relations between bio-entities F-score for protein-disease relations is notably low
be(e.g., “miRNA-interacts with-gene” and “miRNA-regulates cause it is influenced by low recall. We noticed that many
activity of-gene”) to evaluate the ability of SPIRES to protein-disease relations are undetected, often because
identify the right relations according to the provided they are expressed in complex ways within the text. For
meta-graph; ) the paragraph might also contain irrele- instance, the interchangeable use of symbols like “/” and
vant relationships that should be discarded; ) diferent “,” (e.g., “overexpressions in IL6/MEGF8/RELA, and also
identification schemes can be used in the paragraph to TP53 are known to cause osteoporosis”). Additionally,
check the ability of SPIRES to correctly work with them. mapping proteins to the PRO proves challenging when
Paragraphs have been classified according to the kind of textual information is sparse or ambiguously expressed.
bio-entities that they describe and associated with the For instance, the mention of “PMP-22” solely as “myelin
list of relationships that should be identified according protein 22” instead of “peripheral myelin protein 22” (due
to the adopted meta-graph. For each kind of bio-entity, to assumptions made by authors) can lead to inaccurate
the following table shows the number of paragraphs con- grounding. Despite this, precision remains remarkably
taining relationships involving it (note that a paragraph high and, in the biomedicine context, this is preferable
can contain more than one). because it prioritizes certainty over ambiguity.</p>
        <p>Protein Disease miRNA Gene We also compared our results with the average results
44 58 37 21 achieved by SPIRES in other domains. A marginal
improvement has been observed in the domain of name</p>
        <p>
          In the considered paragraphs, we have identified six entity recognition for chemicals and diseases [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. We
bekinds of interactions among the considered bio-entities lieve that the slightly enhanced accuracy is due to the use
(reported in the y-axis of the diagram in Figure 4). of multiple ontology annotators such as PRO for proteins,
Mondo and HPO for diseases, and RO for relations.
4.2. Accuracy of Interactions extraction
For evaluating the obtained predictions, we have used
standard metrics (precision, recall, and F-score) by con- For assessing the performance of SPIRES with respect
sidering the True Positive (TP), False Positive (FP), and to ChatGPT and Llama 2, we focused on a subset of 20
4.3. Comparison with other LLMs
miRNA-disease
miRNA-miRNA
gene-protein
miRNA-gene
protein-disease
disease-disease
miRNA-disease
miRNA-miRNA
gene-protein
miRNA-gene
protein-disease
        </p>
        <p>Total
documents where we manually grounded instances and
relationships of the extracted triples. For using ChatGPT
and Llama 2 we have generated prompts that adhere to
the following pattern:
an advantage of basic LLMs approaches, but it is not.</p>
        <p>
          Indeed, the schema allows us to reduce the relationships
to be extracted to only meaningful ones in the considered
extract triples in the form domain. Finally, no lookup table can be exploited for
"subject-relation-object " translating class instance names with the corresponding
within this document: [...] identifiers in the bio-ontologies (thus requiring a manual
This prompt does not guarantee to obtain the identi- identification of the identifiers). All these drawbacks are
ifers for the subject and the object of the triples. However, avoided by the use of SPIRES.
if we try to generate a further prompt with the explicit As shown in the bottom part of Figure 5, SPIRES
outrequest of mapping the extracted concepts to appropriate performs ChatGPT or Llama 2 alone both in terms of
terminologies, both ChatGPT and Llama 2 advise that the precision and recall. The histogram in Figure 5 points
provided ontology identifiers are hypothetical and may out a high increment in TP rate and a sensible decrease in
not correspond to actual ontology identifiers (so, hallu- FP and FN rates when adopting SPIRES instead of
Chatcinations can occur in this case). Therefore we decided GPT or Llama 2 alone for extracting relations that adhere
to substitute the grounding process with our manually to a specified schema within texts.
curated look-up tables [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>When using ChatGPT (or Llama 2) alone, we do not 5. Concluding remarks
have to specify the schema, and results are produced
through a single interaction with the user. Avoiding In this paper, we have reported the initial
experimenthe specification of the schema might be interpreted as tation of the use of SPIRES for extracting triples from
the scientific literature related to RNA molecules by tak- [9] S. Pan, et al., Unifying large language models and
knowling advantage of the meta-graph we have realized for edge graphs: A roadmap, 2023. arXiv:2306.08302.
the generation of RNA-KG. Even if a more systematic [10] S. Ateia, U. Kruschwitz, Is ChatGPT a Biomedical Expert?
analysis is required, the initial results are quite encour- – Exploring the Zero-Shot Performance of Current GPT
aging. To facilitate the reproducibility of our results, our Models in Biomedical Tasks, 2023. arXiv:2306.16108.
dataset and the LinkML template can be downloaded [11] Z. Ji, et al., Survey of hallucination in natural language
generation, ACM Computing Surveys 55 (2023) 1–38.
from: https://doi.org/10.5281/zenodo.10671796. doi:10.1145/3571730.</p>
        <p>As future work, we would like to extend the approach [12] A. Ettinger, What BERT is not: Lessons from a new
by integrating the entire RNA-KG in diferent ways. First, suite of psycholinguistic diagnostics for language
modwe will exploit the RNA-KG triples for enhancing the els, Transactions of the Association for Computational
prompts generated by SPIRES. Moreover, RNA-KG can Linguistics 8 (2020) 34–48. doi:10.1162/tacl_a_00298.
be used for validating the plausibility of the generated [13] D. A. Natale, et al., The protein ontology: a structured
triples by using RNA-KG as a gold standard in the area. representation of protein forms and complexes, Nucleic
Furthermore, we will explore the KG-enhanced LLM in- Acids Research 39 (2010). doi:10.1093/nar/gkq907.
ference approaches in combination with SPIRES for fur- [14] C. Mungall, et al., oborel/obo-relations: 2023-08-18
rether improving the precision of the system by injecting lease, 2023. doi:10.5281/zenodo.8263469.
[15] Z. Zhang, et al., ERNIE: Enhanced language
represenknowledge extracted from RNA-KG at inference time. tation with informative entities, in: Proc. of Annual
Finally, we would like to create a web environment for Meeting of the Association for Computational
Linguisgraphically showing to the user the predicted triples di- tics, 2019, pp. 1441–1451. doi:10.18653/v1/P19- 1139.
rectly in the graphical representation of the portion of [16] C. Rosset, et al., Knowledge-aware language model
prethe knowledge graph that will contain them. The user training, CoRR (2020). arXiv:2007.00655.
can thus manually check the proposed triples and pro- [17] P. Lewis, et al., Retrieval-augmented generation for
vide feedback that will be handled afterward to improve knowledge-intensive NLP tasks, in: Proc. of the 34th
the quality of the predictions. Int’l Conf. on Neural Information Processing Systems,
Curran Associates Inc., Red Hook, NY, USA, 2020.
[18] M. Danilevsky, et al., A survey of the state of explainable
Acknowledgements AI for natural language processing, in: Proc. of Int’l Conf.
on Natural Language Processing, 2020, pp. 447–459.</p>
        <p>This research was in part supported by the “National Center for [19] B. Y. Lin, X. Chen, J. Chen, X. Ren, KagNet:
KnowledgeGene Therapy and Drugs based on RNA Technology”, PNRR- aware graph networks for commonsense reasoning, 2019.
NextGeneration EU program [G43C22001320007] and in part arXiv:1909.02151.
by the MUSA - Multilayered Urban Sustainability Action - [20] E. Cavalleri, et al., A meta-graph for the construction
Project, funded by the PNRR-NextGeneration EU program of an rna-centered knowledge graph, in: Bioinformatics
([G43C22001370007], Code ECS00000037). and Biomedical Engineering, Springer, 2023, pp. 165–180.
doi:10.1007/978- 3- 031- 34953- 9_13.
[21] G. J. Hannon, Rna interference, Nature 418 (2002)
References 244–251. doi:10.1038/418244a.
[22] L. Guo, et al., miRNA–miRNA interaction implicates for
potential mutual regulatory pattern, Gene 511 (2012)
187–194. doi:10.1016/j.gene.2012.09.066.
[23] E. C. Lai, et al., Complementary miRNA pairs suggest a
regulatory role for miRNA:miRNA duplexes., RNA 10
(2004) 171–175. doi:10.1261/rna.5191904.
[24] R. L. Seal, et al., Genenames.org: the HGNC resources
in 2023, Nucleic Acids Research 51 (2022) D1003–D1009.</p>
        <p>doi:10.1093/nar/gkac888.
[25] N. A. Vasilevsky, et al., Mondo: Unifying diseases for
the world, by the world, medRxiv (2022). doi:10.1101/
2022.04.13.22273750.
[26] P. N. Robinson, et al., The human phenotype ontology:</p>
        <p>A tool for annotating and analyzing human hereditary
disease, The American Journal of Human Genetics 83
(2008) 610–615. doi:10.1016/j.ajhg.2008.09.017.
[27] A. Kozomara, et al., miRBase: from microRNA sequences
to function, Nucleic Acids Research 47 (2018) D155–D162.
doi:10.1093/nar/gky1141.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Cavalleri</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>RNA-KG</surname>
          </string-name>
          :
          <article-title>An ontology-based knowledge graph for representing interactions involving RNA molecules</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2312</volume>
          .
          <fpage>00183</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Caufield</surname>
          </string-name>
          , et al.,
          <article-title>Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning</article-title>
          ,
          <source>Bioinformatics</source>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .1093/bioinformatics/ btae104.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bommasani</surname>
          </string-name>
          , et al.,
          <article-title>On the opportunities and risks of foundation models</article-title>
          ,
          <source>CoRR abs/2108</source>
          .07258 (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Moxon</surname>
          </string-name>
          , et al.,
          <article-title>The Linked Data Modeling Language (LinkML): A General-Purpose Data Modeling Framework Grounded in Machine-Readable Semantics</article-title>
          ,
          <source>in: Int'l Conf. on Biomedical Ontologies</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>148</fpage>
          -
          <lpage>151</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5] OpenAI, Gpt-4
          <source>tech. report</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2303</volume>
          .
          <fpage>08774</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          , et al.,
          <source>Llama</source>
          <volume>2</volume>
          :
          <article-title>Open foundation and finetuned chat models</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2307</volume>
          .
          <fpage>09288</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A. Q.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , et al.,
          <source>Mistral 7b</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2310</volume>
          .
          <fpage>06825</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Tunstall</surname>
          </string-name>
          , et al.,
          <source>Zephyr: Direct Distillation of LM Alignment</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2310</volume>
          .
          <fpage>16944</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>