<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">On the extraction of meaningful RNA interactions from Scientific Publications through LLMs and SPIRES</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Emanuele</forename><surname>Cavalleri</surname></persName>
							<email>emanuele.cavalleri@unimi.it</email>
							<affiliation key="aff0">
								<orgName type="department">AnacletoLab -Dipartimento di Informatica</orgName>
								<orgName type="institution">Università degli Studi di Milano</orgName>
								<address>
									<addrLine>Via Celoria 18</addrLine>
									<settlement>Milano</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marco</forename><surname>Mesiti</surname></persName>
							<email>marco.mesiti@unimi.it</email>
							<affiliation key="aff0">
								<orgName type="department">AnacletoLab -Dipartimento di Informatica</orgName>
								<orgName type="institution">Università degli Studi di Milano</orgName>
								<address>
									<addrLine>Via Celoria 18</addrLine>
									<settlement>Milano</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">On the extraction of meaningful RNA interactions from Scientific Publications through LLMs and SPIRES</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">DEA957F568922DF80E8EC165415FE886</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:17+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>RNA-based technologies</term>
					<term>Knowledge Graphs</term>
					<term>RNA-drug discovery</term>
					<term>Large Language Models</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Knowledge graphs (KGs) are useful tools to uniformly represent and integrate heterogeneous information about a domain of interest. However, they are inherently incomplete; therefore, new facts should be introduced by extracting them from structured and unstructured data sources. Starting from RNA-KG, the first KG tailored for representing different kinds of RNA molecules that we recently developed, in this paper we evaluate the use of SPIRES for extracting interactions among bio-entities involving RNA molecules from scientific papers guided by the RNA-KG schema. SPIRES is a general-purpose knowledge extraction system for mining information conforming to a specified schema. A customized prompt is generated and submitted to a Large Language Model (LLM) along with a text to extract a set of RDF triples adhering to the schema constraints. The experiments show a high accuracy in extracting interactions from the scientific literature.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The "RNA world" represents a novel frontier for the study of fundamental biological processes and human diseases and is paving the way for the development of new drugs tailored to the patient's biomolecular characteristics. Although scientific data about coding and non-coding RNA molecules are continuously produced and made available from public repositories, they are scattered across different databases and in the scientific literature. A centralized, uniform, and semantically consistent representation of the knowledge on RNA is still lacking. We have recently constructed RNA-KG <ref type="bibr" target="#b0">[1]</ref>, a knowledge graph integrating biological knowledge about RNA molecules with their functional relationships with genes, proteins, and chemicals and biomedical ontological concepts. RNA-KG includes around 600K nodes and 9M RDF triples representing reliable interactions involving RNA molecules and related biomedical concepts extracted from more than 50 public data sources according to 11 bio-ontologies. RNA-KG is coupled with a meta-graph representing all the possible interactions involving RNA molecules. SPIRES (Structured Prompt Interrogation and Recursive Extraction of Semantics) <ref type="bibr" target="#b1">[2]</ref> is a recently proposed approach to information extraction that exploits Large Language Models (LLMs) <ref type="bibr" target="#b2">[3]</ref> to identify instances of a knowledge schema expressed in terms of LinkML <ref type="bibr" target="#b3">[4]</ref> starting from plain texts. By identifying and extracting relevant information from an input text, it adopts zeroshot learning to identify and extract relevant entities and relationships among them, which are then normalized and grounded through ontologies and vocabularies. SPIRES is a general-purpose approach that can be used across a variety of domains and does not require specific training/tuning on the considered domain. SPIRES adopts an engineering approach for creating prompts for interacting with an LLM (like GPT <ref type="bibr" target="#b4">[5]</ref>, Llama 2 <ref type="bibr" target="#b5">[6]</ref>, Mistral <ref type="bibr" target="#b6">[7]</ref>, and Zephyr <ref type="bibr" target="#b7">[8]</ref>) to improve the quality of the generated responses <ref type="bibr" target="#b8">[9]</ref>. In this way, technical challenges for generative AI (e.g., constructing comprehensive real-world knowledge and improving the accuracy of automated responses) can be addressed.</p><p>In this paper, we discuss the initial experimental results that we obtained by applying SPIRES in the extraction of interactions among bio-entities involving RNA molecules in the context of the PNRR project "Gene Therapy and Drugs based on RNA Technology". The purpose of the experiments is to show the level of accuracy of the system in extracting interactions from the scientific literature and investigate the possibility of combining RNA-KG with LLMs. Note that the extraction of interactions involving RNA molecules is particularly challenging for two reasons. First, a well-recognized ontology for characterizing non-coding RNA molecules is still lacking, and then different identifiers for representing the same bio-entity are adopted. Even if a more systematic evaluation should be conducted, the initial results are very encouraging.</p><p>The paper is structured as follows. Section 2 describes the SPIRES approach and related approaches that integrate LLMs with knowledge data. Section 3 presents the LinkML schema that we have developed for interacting with SPIRES. Section 4 describes the experimental results, while Section 5 reports concluding remarks. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">SPIRES and Related Work</head><p>The population of a KG by extracting triples from unstructured texts is an interesting research activity and the advent of LLMs has boosted the interpretation of highly technical languages as shown on question-answering benchmarks <ref type="bibr" target="#b9">[10]</ref>. However, these techniques have shown different limitations, such as generating incorrect statements due to hallucinations <ref type="bibr" target="#b10">[11]</ref> and insensitivity to negations <ref type="bibr" target="#b11">[12]</ref>, that cannot be tolerated in sensitive domains like precision medicine. SPIRES adopts: 𝑖) the knowledge schema of a specific domain for the generation of prompts for reducing these drawbacks; and 𝑖𝑖) bio-ontologies for enhancing the quality of the produced information. Figure <ref type="figure">1</ref> outlines the SPIRES workflow. SPIRES requires the specification of the knowledge schema expressed in LinkML <ref type="bibr" target="#b3">[4]</ref> to guide the system in the extraction of knowledge. A LinkML schema contains the classes of entities and relationships among them within the specified domain. Classes can also include attributes (e.g., name, type, and list of synonyms) to enrich entity description. The LinkML schema is automatically processed to generate a list of prompts through which SPIRES interacts with a LLM (e.g., GPT3, GPT4, Llama 2, Mistral, and Zephyr). Each prompt of the list is submitted to the LLM for collecting information that is exploited for completing the following prompt by eventually considering the bio-ontologies (e.g., for changing a protein symbol with the corresponding identifier in an ontology). This refinement recursive process improves the quality of the information gathered through the LLM.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Example 1. Suppose we wish to extract proteins from a text. A LinkML expression can be generated for describing the class</head><p>Protein with its properties and the adopted identification scheme (See Figure <ref type="figure">1</ref>). A prompt is then generated for this class and used for extracting proteins. However, the result obtained by ChatGPT alone (in this case COX20) is not compliant with the Protein class structure. Therefore, SPIRES exploits bio-ontologies (e.g. PRotein Ontology -PRO <ref type="bibr" target="#b12">[13]</ref>) to obtain an adequate result.</p><p>Furthermore, in case relationships are identified, SPIRES selectively retains only those aligned with the predefined schema that can be grounded to the Relations Ontology (RO <ref type="bibr" target="#b13">[14]</ref>). By exploiting standard identification schemes adopted by the reference bio-ontologies, the system guarantees the generation of triples that can be easily integrated into a biomedical KG.</p><p>SPIRES thus creates and refines prompts to maximize the effectiveness of LLMs by exploiting domain knowledge encapsulated through the description of the classes and relationships that we wish to include in the KG.</p><p>As outlined in <ref type="bibr" target="#b8">[9]</ref>, the explicit and structured information contained in KGs can also be used for improving the knowledge awareness of LLMs. KGs have been used: 𝑖) in the training of the LLM <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b15">16]</ref>; 𝑖𝑖) during the inference stage for making available to the LLMs the latest knowledge without retraining <ref type="bibr" target="#b16">[17]</ref>; 𝑖𝑖𝑖) to improve the interpretability of LLMs by explaining the facts <ref type="bibr" target="#b17">[18]</ref> and by enhancing the reasoning process of LLMs <ref type="bibr" target="#b18">[19]</ref>. One of the main disadvantages of solution 𝑖) is that the enhancement of the knowledge contained in the KG requires a retraining of the model which is a time (and money) consuming activity. For this reason, approaches of solution 𝑖𝑖) are gaining momentum because they allow the separation of the text space and the knowledge space. In this case, knowledge is injected at the time of inference.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">The SPIRES Schema for RNA-KG</head><p>For the creation of the schema needed for the application of SPIRES, we considered the RNA-KG meta-graph <ref type="bibr" target="#b19">[20]</ref> that represents all the kinds of relationships involving RNA molecules in the considered data sources. Starting from it, a UML class diagram was developed that formally describes the schema of the considered domain and can be used for identifying meaningful relationships in the considered domain. Figure <ref type="figure" target="#fig_0">2</ref> shows an excerpt of the generated UML class diagram that consists of four biological and biomedical classes (miRNA, gene, protein, and disease) with six kinds of RO relationships. miRNA molecules are small non-coding RNAs that play a central role in gene expression via interference pathways and their misregulation is associated with several diseases <ref type="bibr" target="#b20">[21]</ref>. miRNA molecules can generically interacts with genes but also more precisely regulate the activity of a gene when a miRNA molecule blocks the transof a gene or promotes the degradation of gene's product. Moreover, miRNA molecules can regulate the activity of other miRNAs because they form basepairing interactions with complementary miRNA molecules according to <ref type="bibr" target="#b21">[22,</ref><ref type="bibr" target="#b22">23]</ref>. The schema also contains the relationships involving genes and proteins. Specifically, the has gene product relation and its inverse gene product of are used for representing that different proteins are translated from the same gene (i.e. isoforms); while the regulates activity of is used for representing that a subclass of the proteins (transcription factors) regulates the activity of genes, promoting or down-regulating their activity acting as enhancers or repressors. Both proteins and miRNAs are connected to the disease class by the causes or contributes to condition relation. The diagram also contains the main properties that can be associated with these bio-entities (e.g., nucleotide/amino acid sequences, descriptions of molecules/diseases, synonyms).</p><p>The proposed UML class diagram was translated into a LinkML schema. Genes are annotated using HGNC <ref type="bibr" target="#b23">[24]</ref> IDs. This choice is motivated by the stability of the HGNC IDs even if a gene name or symbol changes. Proteins are grounded to the PRotein Ontology (PRO) while diseases are grounded to both the Monarch Disease Ontology (Mondo <ref type="bibr" target="#b24">[25]</ref>) and the Human Phenotype Ontology (HPO <ref type="bibr" target="#b25">[26]</ref>). miRNAs were left with no semantic annotation since miRNA labels (e.g., hsa-let-7b-5p) and miRBase <ref type="bibr" target="#b26">[27]</ref> accession identifiers (MIMAT0000063) are CURIE prefixes not included in default SPIRES annotators. We can manually retrieve miRNA molecules from relationships extracted from SPIRES since their labels follow a pattern (for instance, "hsa-" prefix indicates human miRNAs, "mmu-" prefix murine miRNAs, mature miRNA are designated with "miR-" substring whilst "mir" refers to the stem-loop primary transcript). Labels can be then easily translated into miRBase accession identifiers using a look-up table.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Example 2. A LinkML class used to specify causes or contributes to condition relationships between proteins and diseases is reported in Listing 1. In the expression, we have to specify the need to extract triples representing relationships between proteins and disease in which the only admitted predicate is causes or contributes to condition (RO:0003302</head><p>). In the expression, samples of the kinds of relationships that we wish to extract are reported. The prompt generated for this class relies on the prompts generated for the classes protein and disease and used for the identification of these bio-entities from the scientific literature. Figure <ref type="figure" target="#fig_1">3</ref> shows an output obtained by using SPIRES and the corresponding result obtained by the simple application of ChatGPT. In the SPIRES' output, the extracted interactions are already represented as triples that exploit the required identification scheme. Therefore, checking their presence in RNA-KG and, in case of new triples, their integration is facilitated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental results</head><p>In this section we discuss the experiments that we carried out to evaluate SPIRES for extracting interactions involving RNA molecules. Moreover, we compare SPIRES with ChatGPT (ver. GPT-3.5-turbo), which is the LLM internally integrated in SPIRES, and with Llama 2 (ver. llama-2-70b-chat), another well-known and used LLM.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Corpus of Annotated Documents</head><p>To evaluate the extraction of relations aligned with the meta-graph depicted in Figure <ref type="figure" target="#fig_0">2</ref>, we manually selected a Listing 1: LinkML template for protein-disease interaction. corpus of 60 scientific articles gathered from PubMed, Re-searchGate, and Google Scholar by specifying keywordbased queries like: "disease", "comorbidity", "protein", "miRNA", "miRNA regulation", "gene". From these documents, we identified paragraphs containing useful information to be extracted (e.g., abstract, discussion, or specific subsections within the domain of interest). In the identification of the paragraphs we have taken into account the following guidelines: 𝑖) the paragraph should contain different kinds of relations between bio-entities (e.g., "miRNA-interacts with-gene" and "miRNA-regulates activity of-gene") to evaluate the ability of SPIRES to identify the right relations according to the provided meta-graph; 𝑖𝑖) the paragraph might also contain irrelevant relationships that should be discarded; 𝑖𝑖𝑖) different identification schemes can be used in the paragraph to check the ability of SPIRES to correctly work with them.</p><p>Paragraphs have been classified according to the kind of bio-entities that they describe and associated with the list of relationships that should be identified according to the adopted meta-graph. For each kind of bio-entity, the following table shows the number of paragraphs containing relationships involving it (note that a paragraph can contain more than one).</p><p>Protein Disease miRNA Gene 44 58 37 21</p><p>In the considered paragraphs, we have identified six kinds of interactions among the considered bio-entities (reported in the y-axis of the diagram in Figure <ref type="figure">4</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Accuracy of Interactions extraction</head><p>For evaluating the obtained predictions, we have used standard metrics (precision, recall, and F-score) by considering the True Positive (TP), False Positive (FP), and False Negative (FN) according to the manually tagged paragraphs. Table <ref type="table" target="#tab_2">1</ref> reports the obtained results for the considered interactions ordered according to the F-score measure. The obtained results indicate a consistent trend where recall tends to be lower than precision due to the prevalence of false negatives over false positives. We think this behavior is due to the difficulty in accurately extracting precise relationships from text, especially in distinguishing specific types of relationships. Furthermore, we observe that disease-disease and miRNA-disease interactions present a high F-score. These kinds of interactions are widely studied in the literature and thus a higher number of publications are available with respect to other interactions (like miRNA-miRNA interactions). Consequently, the abundance of this kind of relationships contributes to a higher true positive rate. Conversely, the F-score for protein-disease relations is notably low because it is influenced by low recall. We noticed that many protein-disease relations are undetected, often because they are expressed in complex ways within the text. For instance, the interchangeable use of symbols like "/" and ", " (e.g., "overexpressions in IL6/MEGF8/RELA, and also TP53 are known to cause osteoporosis"). Additionally, mapping proteins to the PRO proves challenging when textual information is sparse or ambiguously expressed. For instance, the mention of "PMP-22" solely as "myelin protein 22" instead of "peripheral myelin protein 22" (due to assumptions made by authors) can lead to inaccurate grounding. Despite this, precision remains remarkably high and, in the biomedicine context, this is preferable because it prioritizes certainty over ambiguity.</p><p>We also compared our results with the average results achieved by SPIRES in other domains. A marginal improvement has been observed in the domain of name entity recognition for chemicals and diseases <ref type="bibr" target="#b1">[2]</ref>. We believe that the slightly enhanced accuracy is due to the use of multiple ontology annotators such as PRO for proteins, Mondo and HPO for diseases, and RO for relations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Comparison with other LLMs</head><p>For assessing the performance of SPIRES with respect to ChatGPT and Llama 2, we focused on a subset of 20 This prompt does not guarantee to obtain the identifiers for the subject and the object of the triples. However, if we try to generate a further prompt with the explicit request of mapping the extracted concepts to appropriate terminologies, both ChatGPT and Llama 2 advise that the provided ontology identifiers are hypothetical and may not correspond to actual ontology identifiers (so, hallucinations can occur in this case). Therefore we decided to substitute the grounding process with our manually curated look-up tables <ref type="bibr" target="#b0">[1]</ref>.</p><p>When using ChatGPT (or Llama 2) alone, we do not have to specify the schema, and results are produced through a single interaction with the user. Avoiding the specification of the schema might be interpreted as  an advantage of basic LLMs approaches, but it is not. Indeed, the schema allows us to reduce the relationships to be extracted to only meaningful ones in the considered domain. Finally, no lookup table can be exploited for translating class instance names with the corresponding identifiers in the bio-ontologies (thus requiring a manual identification of the identifiers). All these drawbacks are avoided by the use of SPIRES.</p><p>As shown in the bottom part of Figure <ref type="figure" target="#fig_2">5</ref>, SPIRES outperforms ChatGPT or Llama 2 alone both in terms of precision and recall. The histogram in Figure <ref type="figure" target="#fig_2">5</ref> points out a high increment in TP rate and a sensible decrease in FP and FN rates when adopting SPIRES instead of Chat-GPT or Llama 2 alone for extracting relations that adhere to a specified schema within texts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Concluding remarks</head><p>In this paper, we have reported the initial experimentation of the use of SPIRES for extracting triples from 1-6 the scientific literature related to RNA molecules by taking advantage of the meta-graph we have realized for the generation of RNA-KG. Even if a more systematic analysis is required, the initial results are quite encouraging. To facilitate the reproducibility of our results, our dataset and the LinkML template can be downloaded from: https://doi.org/10.5281/zenodo.10671796.</p><p>As future work, we would like to extend the approach by integrating the entire RNA-KG in different ways. First, we will exploit the RNA-KG triples for enhancing the prompts generated by SPIRES. Moreover, RNA-KG can be used for validating the plausibility of the generated triples by using RNA-KG as a gold standard in the area. Furthermore, we will explore the KG-enhanced LLM inference approaches in combination with SPIRES for further improving the precision of the system by injecting knowledge extracted from RNA-KG at inference time. Finally, we would like to create a web environment for graphically showing to the user the predicted triples directly in the graphical representation of the portion of the knowledge graph that will contain them. The user can thus manually check the proposed triples and provide feedback that will be handled afterward to improve the quality of the predictions.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Meta-graph of test to evaluate the capabilities of SPIRES.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Example of output for SPIRES and ChatGPT.</figDesc><graphic coords="4,304.57,84.19,197.42,88.20" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: SPIRES vs Llama 2 vs ChatGPT on 20 texts.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 1</head><label>1</label><figDesc>Results for named entity recognition evaluation of SPIRES on relations involving protein, miRNA, disease, and gene entities. Grounding was performed against HGNC, PRO, Mondo, HPO, and RO. TP, FP, and FN results for evaluation of SPIRES on relations involving protein, miRNA, disease, and gene entities.</figDesc><table><row><cell></cell><cell># Paragraphs</cell><cell>TP</cell><cell>FP</cell><cell>FN</cell><cell>F-score</cell><cell>Precision</cell><cell>Recall</cell></row><row><cell>disease-disease</cell><cell>16</cell><cell>54</cell><cell>5</cell><cell>10</cell><cell>0.88</cell><cell>0.92</cell><cell>0.84</cell></row><row><cell>miRNA-disease</cell><cell>32</cell><cell>123</cell><cell>20</cell><cell>31</cell><cell>0.82</cell><cell>0.86</cell><cell>0.80</cell></row><row><cell>miRNA-miRNA</cell><cell>1</cell><cell>19</cell><cell>1</cell><cell>7</cell><cell>0.82</cell><cell>0.95</cell><cell>0.73</cell></row><row><cell>gene-protein</cell><cell>10</cell><cell>52</cell><cell>5</cell><cell>21</cell><cell>0.8</cell><cell>0.91</cell><cell>0.71</cell></row><row><cell>miRNA-gene</cell><cell>13</cell><cell>14</cell><cell>3</cell><cell>5</cell><cell>0.78</cell><cell>0.82</cell><cell>0.74</cell></row><row><cell>protein-disease</cell><cell>24</cell><cell>42</cell><cell>7</cell><cell>60</cell><cell>0.56</cell><cell>0.86</cell><cell>0.41</cell></row><row><cell>Total</cell><cell>(60 texts)</cell><cell>304</cell><cell>41</cell><cell>134</cell><cell>0.76</cell><cell>0.88</cell><cell>0.69</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>This research was in part supported by the "National Center for Gene Therapy and Drugs based on RNA Technology", PNRR-NextGeneration EU program [G43C22001320007] and in part by the MUSA -Multilayered Urban Sustainability Action -Project, funded by the PNRR-NextGeneration EU program ([G43C22001370007], Code ECS00000037).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">RNA-KG: An ontology-based knowledge graph for representing interactions involving RNA molecules</title>
		<author>
			<persName><forename type="first">E</forename><surname>Cavalleri</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2312.00183</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Caufield</surname></persName>
		</author>
		<idno type="DOI">10.1093/bioinformatics/btae104</idno>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">On the opportunities and risks of foundation models</title>
		<author>
			<persName><forename type="first">R</forename><surname>Bommasani</surname></persName>
		</author>
		<idno>CoRR abs/2108.07258</idno>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">The Linked Data Modeling Language (LinkML): A General-Purpose Data Modeling Framework Grounded in Machine-Readable Semantics</title>
		<author>
			<persName><forename type="first">S</forename><surname>Moxon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Int&apos;l Conf. on Biomedical Ontologies</title>
				<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="148" to="151" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title/>
		<author>
			<persName><surname>Openai</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2303.08774</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">Gpt-4 tech. report</note>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Touvron</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2307.09288</idno>
		<title level="m">Llama 2: Open foundation and finetuned chat models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title/>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Q</forename><surname>Jiang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2310.06825</idno>
	</analytic>
	<monogr>
		<title level="j">Mistral</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Tunstall</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2310.16944</idno>
		<title level="m">Zephyr: Direct Distillation of LM Alignment</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Pan</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2306.08302</idno>
		<title level="m">Unifying large language models and knowledge graphs: A roadmap</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Is ChatGPT a Biomedical Expert? -Exploring the Zero-Shot Performance of Current GPT Models in Biomedical Tasks</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ateia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Kruschwitz</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2306.16108</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Survey of hallucination in natural language generation</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Ji</surname></persName>
		</author>
		<idno type="DOI">10.1145/3571730</idno>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<biblScope unit="volume">55</biblScope>
			<biblScope unit="page" from="1" to="38" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ettinger</surname></persName>
		</author>
		<idno type="DOI">10.1162/tacl_a_00298</idno>
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="34" to="48" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">The protein ontology: a structured representation of protein forms and complexes</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Natale</surname></persName>
		</author>
		<idno type="DOI">10.1093/nar/gkq907</idno>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Research</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Mungall</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.8263469</idno>
		<title level="m">oborel/obo-relations</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">ERNIE: Enhanced language representation with informative entities</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P19-1139</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. of Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>of Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1441" to="1451" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Rosset</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2007.00655</idno>
		<title level="m">Knowledge-aware language model pretraining</title>
				<imprint>
			<publisher>CoRR</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Retrieval-augmented generation for knowledge-intensive NLP tasks</title>
		<author>
			<persName><forename type="first">P</forename><surname>Lewis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 34th Int&apos;l Conf. on Neural Information Processing Systems</title>
				<meeting>of the 34th Int&apos;l Conf. on Neural Information essing Systems<address><addrLine>Red Hook, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Curran Associates Inc</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">A survey of the state of explainable AI for natural language processing</title>
		<author>
			<persName><forename type="first">M</forename><surname>Danilevsky</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of Int&apos;l Conf. on Natural Language Processing</title>
				<meeting>of Int&apos;l Conf. on Natural Language essing</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="447" to="459" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">Y</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ren</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1909.02151</idno>
		<title level="m">KagNet: Knowledgeaware graph networks for commonsense reasoning</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">A meta-graph for the construction of an rna-centered knowledge graph</title>
		<author>
			<persName><forename type="first">E</forename><surname>Cavalleri</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-34953-9_13</idno>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics and Biomedical Engineering</title>
		<imprint>
			<biblScope unit="page" from="165" to="180" />
			<date type="published" when="2023">2023</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Rna interference</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J</forename><surname>Hannon</surname></persName>
		</author>
		<idno type="DOI">10.1038/418244a</idno>
	</analytic>
	<monogr>
		<title level="j">Nature</title>
		<imprint>
			<biblScope unit="volume">418</biblScope>
			<biblScope unit="page" from="244" to="251" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">miRNA-miRNA interaction implicates for potential mutual regulatory pattern</title>
		<author>
			<persName><forename type="first">L</forename><surname>Guo</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.gene.2012.09.066</idno>
	</analytic>
	<monogr>
		<title level="j">Gene</title>
		<imprint>
			<biblScope unit="volume">511</biblScope>
			<biblScope unit="page" from="187" to="194" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Complementary miRNA pairs suggest a regulatory role for miRNA:miRNA duplexes</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">C</forename><surname>Lai</surname></persName>
		</author>
		<idno type="DOI">10.1261/rna.5191904</idno>
	</analytic>
	<monogr>
		<title level="j">RNA</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="171" to="175" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">org: the HGNC resources in 2023</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">L</forename><surname>Seal</surname></persName>
		</author>
		<idno type="DOI">10.1093/nar/gkac888</idno>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Research</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<biblScope unit="page" from="D1003" to="D1009" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Mondo: Unifying diseases for the world, by the world</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Vasilevsky</surname></persName>
		</author>
		<idno type="DOI">10.1101/2022.04.13.22273750</idno>
	</analytic>
	<monogr>
		<title level="j">medRxiv</title>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">The human phenotype ontology: A tool for annotating and analyzing human hereditary disease</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">N</forename><surname>Robinson</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.ajhg.2008.09.017</idno>
	</analytic>
	<monogr>
		<title level="j">The American Journal of Human Genetics</title>
		<imprint>
			<biblScope unit="volume">83</biblScope>
			<biblScope unit="page" from="610" to="615" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">miRBase: from microRNA sequences to function</title>
		<author>
			<persName><forename type="first">A</forename><surname>Kozomara</surname></persName>
		</author>
		<idno type="DOI">10.1093/nar/gky1141</idno>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Research</title>
		<imprint>
			<biblScope unit="volume">47</biblScope>
			<biblScope unit="page" from="D155" to="D162" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
