<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">AI Research Assistant</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Mihai</forename><surname>Gheorghe</surname></persName>
							<email>gheorghe@csie.ase.ro</email>
							<affiliation key="aff0">
								<orgName type="institution">Bucharest University of Economic Studies</orgName>
								<address>
									<addrLine>Piața Romană 6</addrLine>
									<settlement>Bucharest</settlement>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Cătălina</forename><surname>Chinie</surname></persName>
							<email>catalina.chinie@fabiz.ase.ro</email>
							<affiliation key="aff0">
								<orgName type="institution">Bucharest University of Economic Studies</orgName>
								<address>
									<addrLine>Piața Romană 6</addrLine>
									<settlement>Bucharest</settlement>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Dumitru</forename><surname>Roman</surname></persName>
							<email>dumitru.roman@sintef.no</email>
							<affiliation key="aff0">
								<orgName type="institution">Bucharest University of Economic Studies</orgName>
								<address>
									<addrLine>Piața Romană 6</addrLine>
									<settlement>Bucharest</settlement>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">SINTEF AS</orgName>
								<address>
									<settlement>Oslo</settlement>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">AI Research Assistant</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">6DB55E3073AA2DF313063568381C18AA</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:27+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Artificial Intelligence</term>
					<term>Retrieval Augmented Generation</term>
					<term>Automated literature review</term>
					<term>Information extraction 1</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The increasing volume of scientific literature and the abundance of publicly accessible data present a substantial hurdle for researchers aiming to stay informed and effectively derive valuable insights. In this paper we discuss the use of LLMs in the context of extracting information from scientific literature and introduce an AI-driven Research Assistant that uses custom Retrieval Augmented Generation (RAG) as a Service and other techniques to streamline processes such as literature review, information extraction, and knowledge discovery.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Large Language Models (LLMs) have gained significant attention across diverse industries, with their remarkable reasoning abilities enabling time savings and idea generation across numerous domains. Scientific research stands to benefit greatly, however, the probabilistic nature of LLM inference can lead to inaccurate responses for specialized queries. To address this, the novel paradigm of Retrieval Augmented Generation (RAG) has emerged over the last years. RAG enhances LLM query context by retrieving factual information from external sources such as documents, databases, and APIs, thus mitigating reliance solely on LLM training data <ref type="bibr" target="#b0">[1]</ref>.</p><p>Furthermore, the RAG paradigm not only improves the accuracy of LLM outputs but also expands their capabilities by enabling access to real-time information and specialized knowledge bases. This dynamic integration of external information allows LLMs to evolve beyond their static training data and stay abreast of the latest developments in rapidly changing fields such as scientific research.</p><p>While Retrieval Augmented Generation (RAG) offers significant improvements in LLM performance, it is not without limitations. The common RAG approach of retrieving top-k semantically similar passages based on vector embeddings and metrics like cosine similarity can encounter challenges. For instance, queries demanding multi-hop reasoning or complex relationships can fall short. Consider the query: "Which is the GDP of the country where the highest mountain peak in the world is?" While a RAG system trained on geography data may accurately answer "Which is the highest mountain peak in the world?", the second query necessitates identifying the country associated with the mountain and then retrieving its GDP, a task potentially beyond simple semantic similarity matching.</p><p>A variety of techniques are being explored to overcome these limitations <ref type="bibr" target="#b1">[2]</ref>. These include:</p><p>• Reranking: Employing a two-step retrieval process, first retrieving a larger set of candidates using efficient methods like cosine similarity, then reranking them using more sophisticated techniques like cross-encoders or even LLMs themselves, to better capture relevance and address multi-hop reasoning. • Hybrid RAG: Combining semantic similarity-based retrieval with knowledge graphs to incorporate explicit relationships and facilitate more complex reasoning <ref type="bibr" target="#b2">[3]</ref>. • Large Context LLMs such as Claude with its 200k token context or Gemini with its 1M+ token context, can significantly enhance RAG systems by incorporating more extensive context directly into the model. This reduces the need for explicit retrieval of external information in many cases, allowing the LLM to draw upon a broader knowledge base to understand and reason over larger chunks of information <ref type="bibr" target="#b3">[4]</ref>. • Hierarchical Embeddings: Leveraging embeddings of the original text alongside various levels of summaries, such as in the RAPTOR model, to enhance retrieval accuracy and efficiency <ref type="bibr" target="#b4">[5]</ref>. • Multi-Hop Query Answering: Decomposing complex queries into simpler sub-queries and employing techniques like Chain-of-Thought prompting to guide LLMs through multi-step reasoning <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8</ref>]. • Autonomous Agents: Utilizing AI agents to navigate diverse information sources and construct more intricate prompts for the LLM, incorporating logic and multi-step reasoning <ref type="bibr" target="#b8">[9]</ref>. • Self-RAG: Where the LLM itself participates in the retrieval process, potentially leading to more adaptive and context-aware retrieval <ref type="bibr" target="#b9">[10]</ref>.</p><p>These advancements highlight the active research and development in the RAG paradigm, aiming to address its limitations and enable LLMs to tackle increasingly complex and nuanced information needs.</p><p>Despite these advancements, constructing a RAG system specifically for scientific research remains a difficult task. Challenges arise in handling diverse content types commonly found in research papers, such as tables, images, and formulas, which often necessitate conversion into natural language text for effective retrieval. Moreover, the research process often involves snowballing, where the corpus of relevant references expands iteratively from an initial set of studies.</p><p>A review of existing AI assistants and RAG systems tailored towards researchers reveals a lack of a universally applicable solution, although several platforms demonstrated advanced techniques and some even appeared customized for scientific research. Existing solutions in this area can be categorized into three distinct classes:</p><p>• AI Document Assistants: Constituting the most prevalent category, these solutions range from freely available to premium licensed offerings. While they leverage cuttingedge LLMs and enable users to upload documents in various formats, answer questions based on those documents, and perform summarization, they often exhibit limitations specifically concerning academic research support. These constraints include operating on a limited number of documents (typically restricted by the LLM's context window), lacking specialization in scientific papers, and generally not performing multi-hop reasoning across multiple documents.  <ref type="bibr" target="#b11">[21]</ref>, detailed information about its features and performance remains limited. However, given Clarivate's established track record, the product has strong potential to become a noteworthy contender in this space.</p><p>To overcome the above mentioned limitations we initiated the development of a RAG as a Service Research Assistant whose features we briefly introduce in the following section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">AI Research Assistant</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Dynamic corpus construction</head><p>Researchers can upload their own PDF papers or initiate automated downloads for specific queries through the implementation of ArXiv and Semantic Scholar APIs, thus having access to millions of papers. GROBID processes papers, resulting in structured XML representations with clearly defined sections, figures, tables, and references <ref type="bibr" target="#b12">[22]</ref>. The automated download function can also recursively expand the corpus by extracting references from the initial document set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Retrieval and question-answering</head><p>Full plain text sections from papers are indexed in ChromaDB [23] using cutting-edge BAAI/bge-m3 [24] dense embeddings. Deviating from most Q&amp;A RAG systems, we employ large paragraph chunks (often entire chapters/sections) to maintain context. Oversized paragraphs are divided into subsections while preserving sentence integrity. Question answering employs cosine similarity retrieval, with results re-ranked using BAAI/bge-reranker-v2-m3 <ref type="bibr">[25]</ref>. In cases where no relevant documents are retrieved, the system transparently informs the researcher that the answer is not grounded in the corpus. Each answer is accompanied by source paper sections, promoting transparency and facilitating further exploration.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Custom information extraction</head><p>Our AI Research Assistant facilitates the extraction of custom-structured information from scientific papers by generating valid JSON schemas from natural language queries. The standard schema extracts a comprehensive set of data including definitions, indicators, hypotheses, key findings, topics, and summaries from each paper. These summaries can be indexed within the vector database as well, enabling the system to also respond to high-level conceptual queries, in contrast to specific questions grounded in isolated paper sections which is addressed by the previously mentioned Q&amp;A RAG features. Cost analysis indicates that leveraging a Model as a Service (for instance, Claude Haiku <ref type="bibr">[26]</ref>) for information extraction incurs an estimated cost of 1 USD per 250 papers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Architecture and scalability</head><p>The system adopts a decoupled multi-server architecture for scalability, with LangChain [27] partially managing orchestration. A GPU-intensive machine is required for vector embedding, re-ranking, and local inference when utilizing local LLMs. The workflow exposes API endpoints that can be consumed by a web application to manage user access.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5.">AI Agents</head><p>The project incorporates agentic behavior, utilizing a ZERO-SHOT Classifier to direct user queries to the most suitable tool. These tools include classic RAG/Q&amp;A based on articles, structured information extraction, and deterministic queries to SQL-like datasets (e.g., locally hosted EUROSTAT data). LangChain's agentic implementation allows for chaining multiple tools in a single user query. Therefore, an answer to a user query can be grounded in both the scientific corpus and a relevant dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Relevance to Rule-Based AI, Decisions, and Reasoning</head><p>Our approach achieves deterministic and explainable results through:</p><p>• Deterministic LLM use (temperature set to 0)</p><p>• Grounding question-answering in the scientific literature corpus and indicating the paper sources along the answer • Enforcing JSON schemas for structured information extraction • Verbose mode for agentic workflow, enhancing explainability • Integration of conventional programming tools for querying structured data sources This research assistant offers a significant contribution to AI-powered literature analysis, providing researchers with a valuable tool for navigating the expansive landscape of scientific knowledge.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Further directions</head><p>We aim to integrate image and figure extraction from papers, leveraging multi-modal LLMs to enrich the dataset. Open models such as PaliGemma <ref type="bibr">[28]</ref> or Idefics <ref type="bibr">[29]</ref> can do image-to-text inference locally on reasonably accessible hardware.</p><p>Related to multi-hop reasoning, we plan to employ techniques like query decomposition and graph neural networks to address complex, multi-step queries.</p><p>Additionally, we plan to develop more agentic tools capable of handling various datasets, expanding the system's capabilities and adaptability.</p><p>In the mid to long term, we intend to enhance our system by integrating knowledge graph retrieval with the existing semantic similarity-based approach. This knowledge graph will be constructed during the paper processing and parsing stage, extracting relevant entities like topics, affiliations, and named entities as nodes. This hybrid approach aims to facilitate more nuanced and complex query handling, enabling the AI Research Assistant to better understand and leverage the intricate relationships within scientific literature.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>An open-source RAG engine with a comprehensive feature set and a user-friendly, modern interface[11]. It offers partial customization in terms of embedding models and LLMs, alongside fine-grained control over chunking strategies. It also provides tools for constructing custom AI agents. However, as a general-purpose product, it relies on paragraph-length chunking rather than semantic or logical separation. It lacks automated corpus construction and multimodal capabilities. Despite these limitations, the product shows promise due to its ongoing development.• Vectara: A company specializing in RAG solutions[12], with a strong reputation in the field. However, they do not offer an out-of-the box readily available Academic RAG as a Service product, offering on demand, customized software. • Other solutions: Several other solutions exist, such as Humata AI [13], Digilist [14], Weaviate Verba [15], Anything LLM [16], and RAGify [17]. However, these options lack certain crucial features, including multi-hop capabilities, custom structured information extraction, re-ranking, traceability, and agentic behavior.</figDesc><table /><note>• General purpose RAG as a Service: Augmenting document assistants with retrieval capabilities enables access to a substantially larger corpus of documents. Solutions within this category encompass both commercial and freely available options. The following are noteworthy examples: • RAGflow: • Academic Research Specialized Assistants: This category comprises solutions specifically designed for academic research. Notable examples include: • Sakana AI Scientist: Primarily focused on autonomous generation of complete academic papers [18], its Q&amp;A capabilities are not the central feature. Although the demonstrated results are impressive, concerns persist regarding benchmarks for factuality and ethical considerations. Similarly, Insilico's Dora [19] also generates full papers without chat or Q&amp;A functionalities, and appears to have fewer features compared to Sakana. Unriddle.ai [20], another solution in this category, generates full papers and even offers LaTeX rendering, but lacks traceability, multi-hop capabilities, LLM or embedding model agnosticism, and structured information extraction. Notably, these solutions do not operate as traditional RAG as a Service platforms. • Clarivate AI Academia: Recently announced by Clarivate</note></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgement</head><p>This research is financed under the Romanian National Recovery and Resilience Plan, by the Romanian Government, under the contract number 268/29.11.2022, Entitled "CAUSEFINDER -CAUSALITY IN THE ERA OF BIG DATA".</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Retrieval-augmented generation for knowledge-intensive NLP tasks</title>
		<author>
			<persName><forename type="first">P</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Piktus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Petroni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Karpukhin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2005.11401</idno>
	</analytic>
	<monogr>
		<title level="m">NIPS &apos;20: Proceedings of the 34th International Conference on Neural Information Processing Systems</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="9459" to="9474" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Searching for Best Practices in Retrieval-Augmented Generation</title>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Xu</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2407.01219</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction</title>
		<author>
			<persName><forename type="first">B</forename><surname>Sarmah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Patel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pasquali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mehta</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2408.04948</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Mei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bendersky</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2407.16833</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Raptor: Recursive abstractive processing for tree-organized retrieval</title>
		<author>
			<persName><forename type="first">P</forename><surname>Sarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Abdullah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Tuli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Khanna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Goldie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2401.18059</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Retrieve, Summarize, Plan: Advancing Multi-hop Question Answering with an Iterative Approach</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2407.13101</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Multihop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2401.15391</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval</title>
		<author>
			<persName><forename type="first">W</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Iyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">Y</forename><surname>Wang</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2009.12756</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Multi-Agent RAG Chatbot Architecture for Decision Support in Net-Zero Emission Energy Systems</title>
		<author>
			<persName><forename type="first">G</forename><surname>Gamage</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Mills</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">De</forename><surname>Silva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Manic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Moraliyage</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jennings</surname></persName>
		</author>
		<author>
			<persName><forename type="first">&amp;</forename><forename type="middle">D</forename><surname>Alahakoon</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICIT58233.2024.10540920</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Industrial Technology (ICIT)</title>
				<meeting><address><addrLine>Bristol, United Kingdom</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024. 2024</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection</title>
		<author>
			<persName><forename type="first">A</forename><surname>Asai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hajishirzi</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2310.11511</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">T</forename><surname>Lange</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Foerster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clune</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ha</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2408.06292</idno>
		<title level="m">The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Ben-Porat</surname></persName>
		</author>
		<ptr target="https://clarivate.com/blog/introducing-the-clarivate-academic-ai-platform" />
		<title level="m">Introducing the Clarivate Academic AI Platform</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Computer software</title>
		<author>
			<persName><surname>Grobid</surname></persName>
		</author>
		<ptr target="https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3" />
		<imprint>
			<date type="published" when="2008">2008-2024</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
