<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Chain-of-Thought to Enhance Document Retrieval in Certified Medical Chatbots</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Leonardo</forename><surname>Sanna</surname></persName>
							<email>lsanna@fbk.eu</email>
							<affiliation key="aff0">
								<orgName type="institution">Fondazione Bruno Kessler</orgName>
								<address>
									<settlement>Trento</settlement>
									<country key="IT">ITALY</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Simone</forename><surname>Magnolini</surname></persName>
							<email>magnolini@fbk.eu</email>
							<affiliation key="aff0">
								<orgName type="institution">Fondazione Bruno Kessler</orgName>
								<address>
									<settlement>Trento</settlement>
									<country key="IT">ITALY</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Patrizio</forename><surname>Bellan</surname></persName>
							<email>pbellan@fbk.eu</email>
							<affiliation key="aff0">
								<orgName type="institution">Fondazione Bruno Kessler</orgName>
								<address>
									<settlement>Trento</settlement>
									<country key="IT">ITALY</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Saba</forename><surname>Ghanbari Haez</surname></persName>
							<email>sghanbarihaez@fbk.eu</email>
							<affiliation key="aff0">
								<orgName type="institution">Fondazione Bruno Kessler</orgName>
								<address>
									<settlement>Trento</settlement>
									<country key="IT">ITALY</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Free University of Bozen</orgName>
								<address>
									<settlement>Bozen</settlement>
									<country key="IT">ITALY</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marina</forename><surname>Segala</surname></persName>
							<email>msegala@fbk.eu</email>
							<affiliation key="aff0">
								<orgName type="institution">Fondazione Bruno Kessler</orgName>
								<address>
									<settlement>Trento</settlement>
									<country key="IT">ITALY</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Monica</forename><surname>Consolandi</surname></persName>
							<email>mconsolandi@fbk.eu</email>
							<affiliation key="aff0">
								<orgName type="institution">Fondazione Bruno Kessler</orgName>
								<address>
									<settlement>Trento</settlement>
									<country key="IT">ITALY</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mauro</forename><surname>Dragoni</surname></persName>
							<email>dragoni@fbk.eu</email>
							<affiliation key="aff0">
								<orgName type="institution">Fondazione Bruno Kessler</orgName>
								<address>
									<settlement>Trento</settlement>
									<country key="IT">ITALY</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Chain-of-Thought to Enhance Document Retrieval in Certified Medical Chatbots</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">C25F03B19B7EFD548651C659E32AB997</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:09+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Conversational Agent</term>
					<term>Digital Health</term>
					<term>Chain-of-Thought</term>
					<term>Certified Information Information Retrieval&apos;s Role in RAG Systems (IR-RAG) -2024</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We propose a Retrieval-Augmented Generation pipeline aimed at retrieving certified medical information. Inspired by the recently introduced Hypothetical Document Embeddings framework, we use the LLM to generate a document to query our certified repository. Although showing promising results in the first user evaluation, the proposed pipeline sometimes fails to retrieve the correct documents. We therefore propose a second Chain-of-thought-inspired pipeline to enhance the generation of the Hypothetical Document and, consequently, the retrieval of the certified documents.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The Hypothetical Document Embeddings (HyDE) framework has been recently introduced as an effective method to build dense retrievers completely unsupervised <ref type="bibr" target="#b0">[1]</ref>. The key idea behind HyDE is to leverage the Large Language Model (LLM) creative abilities to generate a Hypothetical Document (HyDoc) which is then used to retrieve a real document in a repository.</p><p>Hence, HyDE is particularly well-suited for building medical chatbots that operate with"certified information", i.e. conversational agents capable of providing trustworthy information that has been created or verified by domain experts such as physicians or other healthcare professionals in the digital health industry To provide "certified information", the chatbot's reply must be predetermined, namely that we have a predefined set of answers for each specific question. The existing lack of conversational datasets in the medical domain, however, poses a substantial challenge in creating a certified medical chatbot. To tackle this issue, we devised a Retrieval-Augmented Generation (RAG) pipeline within the HyDE framework so that we could benefit from the conversational capabilities of an LLM and, at the same time, exploit the LLM to retrieve the certified sources supporting the reply.</p><p>We believe that adopting HyDE addresses two major issues of RAG pipelines. First of all, we are trying to build a FAQ-based chatbot, therefore most of the interactions with the patients would be short questions. In a FAQ-oriented conversational agent, using a simple naive-RAG pipeline the user query would employed to retrieve the certified sources. Yet, since we are operating with vector databases, the vector representation of the query might be significantly distant from the certified documents in the semantic space, yielding a remarkable risk of excluding relevant documents in the retrieval process.</p><p>Moreover, in a digital health context, it is important to keep our certified medical chatbot explainable <ref type="bibr" target="#b1">[2]</ref>. RAG approaches add a further layer of algorithmic opacity since the user is unaware of the documents used to generate the reply. Therefore, on the one hand, we use the retrieved document to produce a well-grounded and informed reply, while on the other hand, we provide the certified sources that have been retrieved, computing the similarity with the HyDoc.</p><p>Nonetheless, the quality of the generated HyDoc remains a substantial issue in medical domains. Although LLMs have shown impressive results in addressing medical queries <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5]</ref>, relying on the sole abilities of the LLM might result in generating inaccurate or low-quality HyDocs.</p><p>In fact, in a first user evaluation of our proposed modular pipeline, we found evidence that the retrieval step might be problematic when encountering specific types of questions, e.g. evaluative questions. This paper therefore introduces the main challenges we found in developing a modular RAG pipeline in a certified context. In particular, we focus on the proposal of a Chain-of-thought-inspired pipeline to enhance the HyDoc generation and, consequently, improve the retrieval of the certified sources.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related work</head><p>LLMs' credibility and effectiveness are crucial in AI research, especially in areas like digital health and wellbeing that require precision and reliability <ref type="bibr" target="#b5">[6]</ref>. RAG and Chain of Thought (CoT) prompting are highly effective in reducing hallucinations and enhancing factual content generation in LLMs by integrating external knowledge.</p><p>RAG integrates external knowledge into LLMs' prompts through data retrieval using parametric and non-parametric memory <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref>. It has been shown that RAG outperforms parametric-only seq2seq models in tasks like Question Answering (QA) and summarization, improving text generation <ref type="bibr" target="#b8">[9]</ref>.</p><p>Various approaches have been explored to advance QA systems. For instance, the work <ref type="bibr" target="#b9">[10]</ref> involves a two-stage process that combines Dense Passage Retrieval (DPR) with generative sequence-to-sequence LMs. Other examples are the iterative integration of retrieval and generation <ref type="bibr" target="#b10">[11]</ref>, a combination of retrieval and generation techniques for informative answers <ref type="bibr" target="#b11">[12]</ref>, and dynamic real-time retrieval during generation <ref type="bibr" target="#b12">[13]</ref>. Other approaches include techniques to im-prove the accuracy of language models integrating external knowledge <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref>, as well as advancing implicit reasoning and adaptability in QA tasks <ref type="bibr" target="#b8">[9]</ref>.</p><p>On the other hand, CoT methods have been highly effective in improving LLMs' ability to handle complex reasoning tasks, such as those that involve heterogeneous data from tables and questions <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b17">18]</ref>. Some recent studies have shown that breaking down problems into manageable steps significantly enhances LLMs' performance in complex reasoning tasks <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b18">19,</ref><ref type="bibr" target="#b19">20]</ref>.</p><p>The work of <ref type="bibr" target="#b20">[21]</ref> refines self-consistency decoding for broader applications like translation strategies and sentiment analysis, while <ref type="bibr" target="#b21">[22]</ref> introduces the Zero-shot-CoT approach, a technique to improve LLM performance on diverse reasoning tasks, without hand-crafted few-shot examples.</p><p>Finally, we should mention the Tree of Thoughts (ToT) framework <ref type="bibr" target="#b22">[23]</ref>, which has a particularly relevant approach for QA, namely the Probabilistic Tree-of-thought Reasoning (ProbTree) <ref type="bibr" target="#b23">[24]</ref>. This approach breaks down QA into two stages, understanding and reasoning, to solve retrieval issues and prevent error propagation.</p><p>Despite the high research interest and the diversity of approaches both in RAG and CoT, there are currently no studies focusing on certified medical chatbots. Moving within the HyDE framework, we believe that we can employ CoT techniques to improve the generation of the Hypothetical Document that would be then used as the query to retrieve the certified documents.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Dataset</head><p>In our dataset, we have three certified sources. We have (i) 179 informational cards, which were created by the Obstetrician Department of the Hospital of Trento (Italy). Then we have 953 documents from (ii) UPPA, a medical webzine, and 380 documents from (iii) ISS-Salute, which is the informative website of the Istituto Superiore di Sanità -ISS (Italian National Institute of Health).</p><p>It is important to highlight that the dataset we have is not conversational, nor it is meant to be used in a medical chatbot. All sources are what we might call content made for FAQ sections. Therefore, it is often quite verbose and dense in information. All the data we have is unstructured text, with a notable stylistic heterogeneity within the same source. This characteristic is combined with the semantic homogeneity given by the specific medical domain, creating a substantial issue for automatic topic extraction.</p><p>Finally, we should recall that content editing is not permitted due to the certified nature of our information. Since each specific question should consistently correspond to a particular set of equivalent answers., it becomes essential the adoption of modular RAG solutions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Methods</head><p>In this section, we will explain the methods used in our implementation. Our first implementation was a sort of zeroshot implementation since we generated the HyDoc only relying on LLM knowledge, without providing any other context. This solution is shown in Figure <ref type="figure" target="#fig_0">1</ref>. We assessed the performance of this first implementation by doing a user evaluation. The technology presented in this section is the same used for the second implementation illustrated in Section 5.</p><p>In this work, we used GPT-4-turbo (gpt-4-0125-preview specifically) as LLM. However, our pipeline is intended as LLM-agnostic. The use of OpenAI-GPT has, therefore, been intended as a convenient solution to test our RAG pipeline using a stable and well-performing LLM. Indeed, to deploy a conversational assistant in a real-case scenario, an opensource model would likely be required due to cost and privacy issues in accessing any LLM via API. Our approach employs a modular RAG framework designed to address the challenge of delivering natural, verified responses through a medical chatbot by leveraging unstructured data. To achieve this, we create a HyDoc in response to the user's questions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">A first (zero-shot) implementation</head><p>The essence of our strategy lies in enhancing the document retrieval process with the HyDoc. Despite the potential for inaccuracies and hallucinations, the LLM is expected to discern the fundamental aspects of the query and identify textual patterns pertinent to the specific domain of knowledge. Given the proven efficacy of LLMs in fielding medical queries <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5]</ref>, the HyDoc is anticipated to closely align with genuine documents that provide accurate, verified responses to the user's question.</p><p>To query our verified document repository, we utilize the sentence embeddings generated from our HyDoc. The area of general-purpose sentence embeddings remains an active field of research <ref type="bibr" target="#b24">[25]</ref>, in contrast to the more established universal word embedding techniques like word2vec <ref type="bibr" target="#b25">[26]</ref>. Our workflow incorporates the paraphrase-multilingual-mpnetbase-v2 Bi-Encoder model <ref type="bibr" target="#b26">[27]</ref> for generating embeddings of both the HyDoc and the verified data.</p><p>This model introduces a pooling operation to produce a fixed-size embedding vector normalized to a size of 1.00. These vectors are then compared using cosine similarity. However, the Bi-Encoder model encounters challenges in accurately comparing documents of varying lengths, which can lead to the retrieval of irrelevant documents due to the disparity in length between our HyDocs and the documents in the repository.</p><p>To address this issue, we employ the ms-marco-MiniLM-L-6-v2 cross encoder <ref type="foot" target="#foot_0">1</ref> . Unlike the Bi-Encoder which uses separate encoders for each input, the cross-encoder processes pairs of sentences through a single shared encoder, producing a joint representation that is evaluated by a classifier to yield a similarity score between the texts.</p><p>Given the computational demands of the Cross-Encoder, it is applied selectively to a shortlist of potential documents. Following the computation of cosine similarity across all HyDoc-document pairs &lt;HyDoc, 𝐷𝑖&gt;, where 𝑖 ranges from 1 to 𝑛 and 𝐷𝑖 represents the 𝑖 𝑡ℎ document in the verified repository, we rank and select the top 50 documents for their relevance. This guarantees to have an acceptable number of documents from an information retrieval perspective <ref type="bibr" target="#b27">[28]</ref>. Subsequently, the top 3 documents from this refined list are chosen to augment the original prompt, enhancing the text of the final response provided to the user. This decision is based on preliminary tests indicating that using more than three documents could negatively impact the framework's effectiveness.</p><p>Finally, a Guard-Rail module<ref type="foot" target="#foot_1">2</ref> is implemented to ensure the response generated by the LLM adheres to the specified prompt length, incorporating generated text and references to the three selected certified documents in the final answer.</p><p>An initial user evaluation of our zero-shot model was conducted using 100 questions related to pregnancy, deemed representative by expert reviewers. This evaluation focused on seven metrics: {Q1} the relevance of the answer to the question, {Q2} the relevance of the links (documents) provided, {Q3} text quality, {Q4} reliability, {Q5} clarity, {Q6} completeness, and {Q7} an overall evaluation score. According to Table <ref type="table">1</ref>, while the model demonstrated potential in text quality, it highlighted the need for improved document retrieval, as evidenced by the document link relevance scoring an average of 0.44. This value demonstrates that there is still room for improvement, but on average, half of the documents included in the links sent to the users have been considered fully relevant.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>The results of the first user evaluation. All metrics are Likert scales with a range of 1 to 5 except {Q1}, which is a binary metric (1 for positive, zero for negative), and {Q2} which is a precision score calculated on the three links </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Towards a CoT pipeline</head><p>As shown in Section 4, our first implementation has substantial room for improvement in the retrieval step. In particular, we noticed a decline in the link relevance evaluation regarding a particular type of question, i.e., evaluative questions. Evaluative questions are quite common in the medical domain and they represent the 23% of the dataset within the user evaluation we performed. In a nutshell, they are inquiries that need direct feedback on a particular aspect (e.g., "Why I am feeling so tired?"). In this case, the average link relevance is 0.31, whereas non-evaluative questions have a 0.48 average link relevance. We argue that the worse performance on evaluative questions is mostly because generating an evaluative answer might be complex for the LLM also. Moreover, the HyDoc generate would likely be a punctual reply on the precise aspect, since this is the expected natural reply in a conversation. Since we are retrieving full documents, it might be that the vector representation of an evaluative HyDoc is quite distant from the original document where we can find the reply.</p><p>Therefore, we are annotating our dataset to enable the retrieval of shorter text segments. The idea is that we can split our documents into shorter and more meaningful segments to ease the retrieval step and enhance the generation part.</p><p>A second version of our pipeline has been tested on the subset of evaluative questions (Figure <ref type="figure" target="#fig_1">2</ref>). The new pipeline is inspired by a CoT logic and, therefore, is aimed at generating a better HyDoc. First, we generate the HyDoc after a naive-RAG step. In a pre-retrieval step, the user question is hence used to query our certified repository, and the retrieved context is used to generate the HyDoc. Moreover, we also include more contextual information about the query aimed at enhancing the similarity between the HyDoc and the contexts that need to be retrieved in the augmented prompt. For instance, we provide within the prompt useful pragmatic information to generate an evaluative reply, such as presupposition and implications <ref type="bibr" target="#b28">[29]</ref>.</p><p>The CoT has proven to be capable of enhancing the quality of the generated HyDoc. Moreover, it has shown the ability to increase the semantic similarity between the HyDoc and the relevant documents to retrieve. This comparison considers the relevant textual segments containing the pertinent information using the paraphrase-multilingual-mpnet-base-v2 Bi-Encoder.</p><p>In the naive-RAG step, we employ a Chroma vector database. We experimented three different embedders, namely the two OpenAI models text-embedding-3-small (hereafter GPT-small), text-embedding-3-large (hereafter GPT-large), and the Bi-Encoder model used for the document retrieval module. As shown in Table <ref type="table" target="#tab_1">2</ref>, using CoT prompting generated a better HyDoc with OpenAI embeddings, while it seems not influential for the Bi-Encoder model. Even though the increase in cosine similarity is small we should recall that our documents share a considerable degree of semantic similarity. Consequently, this leads to a densely populated vector space, where even marginal enhancements in similarity can yield substantial benefits in the retrieval process. Anyhow, the naive-RAG step effectively enhances HyDoc similarity both using GPT-large and in the Bi-Encoder embeddings.</p><p>Finally, the last step of the pipeline uses the HyDoc, the query context and the retrieved certified context to generate the reply. This provides the user with an appropriately framed answer as well as the documents involved in the generation process.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusions</head><p>We have presented a modular RAG approach that enables the delivery of certified medical information. The modular pipeline allowed us to operate on unstructured texts with limited data annotation possibilities. A first user evaluation  showed promising results for our approach, although it revealed some flaws in some specific types of questions, namely evaluative questions.</p><p>We therefore tested a CoT pipeline on this specific subtype of questions, to overcome the limitations showed in the user evaluation. This approach proved to have a positive impact on the retrieval modules, enhancing semantic similarity between the HyDoc and the certified contexts, as well as on textual generation.</p><p>Surely, we should consider that we tested the CoT pipeline on a rather small dataset and that we used OpenAI-GPT as a readily available state-of-the-art LLM. Our research efforts are currently focusing on expanding the dataset and testing different open-source LLMs, as we intend our pipeline as completely LLM-agnostic.</p><p>Finally, we should also recall that in this work we presented a user evaluation and the analysis of its results. Further work is needed to create a ground truth on a comprehensive dataset of questions to assess the performance of the retrieval modules.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: An overview of the RAG model we are implementing.</figDesc><graphic coords="2,309.59,197.35,222.75,125.63" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: The proposed CoT pipeline</figDesc><graphic coords="4,72.00,66.60,501.90,186.45" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>The average cosine similarity between the HyDoc and the actual certified context in the "Evaluative Questions" subset</figDesc><table><row><cell>Prompt</cell><cell>GPT-small</cell><cell cols="2">GPT-large Bi-encoder</cell></row><row><cell cols="2">Question + Context + Naive-RAG 0.766</cell><cell>0.820</cell><cell>0.801</cell></row><row><cell>Question + Naive-RAG</cell><cell>0.736</cell><cell>0.806</cell><cell>0.807</cell></row><row><cell>Question</cell><cell>0.717</cell><cell>0.717</cell><cell>0.717</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">Refer to Mangaokar et al. https://arxiv.org/abs/2402.15911 for an example</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We acknowledge the support provided by the PNRR initiatives: INEST (Interconnected North-East Innovation Ecosystem), project code ECS00000043, and FAIR (Future AI Research), project code PE00000013. These projects are part of the NRRP MUR program, funded by the NextGenerationEU. This paper is supported by the TrustAlert project, funded by Fondazione Compagnia San Paolo and Fondazione CDP under the "Artificial Intelligence" call.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Precise zero-shot dense retrieval without relevance labels</title>
		<author>
			<persName><forename type="first">L</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Callan</surname></persName>
		</author>
		<idno type="DOI">10.18653/V1/2023.ACL-LONG.99</idno>
		<ptr target="https://doi.org/10.18653/v1/2023.acl-long.99.doi:10.18653/V1/2023.ACL-LONG.99" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Rogers</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Boyd-Graber</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Okazaki</surname></persName>
		</editor>
		<meeting>the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023<address><addrLine>Toronto, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">July 9-14, 2023. 2023</date>
			<biblScope unit="page" from="1762" to="1777" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Explainable ai (xai): A systematic meta-survey of current challenges and future opportunities</title>
		<author>
			<persName><forename type="first">W</forename><surname>Saeed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Omlin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Knowledge-Based Systems</title>
		<imprint>
			<biblScope unit="volume">263</biblScope>
			<biblScope unit="page">110273</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Chatgpt-4: an assessment of an upgraded artificial intelligence chatbot in the united states medical licensing examination</title>
		<author>
			<persName><forename type="first">A</forename><surname>Mihalache</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Popovic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">H</forename><surname>Muni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Medical Teacher</title>
		<imprint>
			<biblScope unit="volume">46</biblScope>
			<biblScope unit="page" from="366" to="372" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Performance of artificial intelligence chatbots in sleep medicine certification board exams: Chatgpt versus google bard</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">C T</forename><surname>Cheong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">P</forename><surname>Pang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Unadkat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Mcneillis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Williamson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Joseph</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Randhawa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Andrews</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Paleri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">European Archives of Oto-Rhino-Laryngology</title>
		<imprint>
			<biblScope unit="page" from="1" to="7" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Evaluating the feasibility of chatgpt in healthcare: an analysis of multiple clinical and research scenarios</title>
		<author>
			<persName><forename type="first">M</forename><surname>Cascella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Montomoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Bellini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Bignami</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Medical Systems</title>
		<imprint>
			<biblScope unit="volume">47</biblScope>
			<biblScope unit="page">33</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Artificial intelligence and chatbots in psychiatry</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">T</forename><surname>Pham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nabizadeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Selek</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11126-022-09973-8</idno>
		<ptr target="https://doi.org/10.1007/s11126-022-09973-8.doi:10.1007/s11126-022-09973-8" />
	</analytic>
	<monogr>
		<title level="j">Psychiatr Q</title>
		<imprint>
			<biblScope unit="volume">93</biblScope>
			<biblScope unit="page" from="249" to="253" />
			<date type="published" when="2021-01-23">2022. received 26 September 2021. Revised 23 January 2022. Accepted 26 January 2022. Published 25 February 2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Retrieval-augmented generation for knowledge-intensive nlp tasks</title>
		<author>
			<persName><forename type="first">P</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Piktus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Petroni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Karpukhin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Küttler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>-T. Yih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rocktäschel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="9459" to="9474" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Dense passage retrieval for open-domain question answering</title>
		<author>
			<persName><forename type="first">V</forename><surname>Karpukhin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Oguz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Min</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Edunov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-T</forename><surname>Yih</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-main.550</idno>
		<ptr target="https://aclanthology.org/2020.emnlp-main.550.doi:10.18653/v1/2020.emnlp-main.550" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">B</forename><surname>Webber</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Cohn</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</editor>
		<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="6769" to="6781" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Improving the domain adaptation of retrieval augmented generation (RAG) models for open domain question answering</title>
		<author>
			<persName><forename type="first">S</forename><surname>Siriwardhana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weerasekera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Wen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kaluarachchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Nanayakkara</surname></persName>
		</author>
		<idno type="DOI">10.1162/tacl_a_00530</idno>
		<ptr target="https://aclanthology.org/2023.tacl-1.1.doi:10.1162/tacl_a_00530" />
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="1" to="17" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Leveraging passage retrieval with generative models for open domain question answering</title>
		<author>
			<persName><forename type="first">G</forename><surname>Izacard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.eacl-main.74</idno>
		<ptr target="https://aclanthology.org/2021.eacl-main.74.doi:10.18653/v1/2021.eacl-main.74" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">P</forename><surname>Merlo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Tiedemann</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Tsarfaty</surname></persName>
		</editor>
		<meeting>the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="874" to="880" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Shao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Duan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.findings-emnlp.620</idno>
		<ptr target="https://aclanthology.org/2023.findings-emnlp.620.doi:10.18653/v1/2023.findings-emnlp.620" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2023, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Bouamor</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Pino</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Bali</surname></persName>
		</editor>
		<meeting><address><addrLine>Singapore</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="9248" to="9274" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Retrieval augmented generation with rich answer encoding</title>
		<author>
			<persName><forename type="first">W</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lapata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Vougiouklis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Papasarantopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">Z</forename><surname>Pan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics</title>
		<title level="s">Long Papers</title>
		<meeting>the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1012" to="1025" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Active retrieval augmented generation</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dwivedi-Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Callan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Neubig</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.emnlp-main.495</idno>
		<ptr target="https://aclanthology.org/2023.emnlp-main.495.doi:10.18653/v1/2023.emnlp-main.495" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Bouamor</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Pino</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Bali</surname></persName>
		</editor>
		<meeting>the 2023 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics<address><addrLine>Singapore</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="7969" to="7992" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Augmentation-adapted retriever improves generalization of language models as generic plug-in</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 61st Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="2421" to="2436" />
		</imprint>
	</monogr>
	<note>Long Papers, Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Knowledge-augmented language model verification</title>
		<author>
			<persName><forename type="first">J</forename><surname>Baek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jeong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hwang</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.emnlp-main.107</idno>
		<ptr target="https://aclanthology.org/2023.emnlp-main.107.doi:10.18653/v1/2023.emnlp-main.107" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Bouamor</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Pino</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Bali</surname></persName>
		</editor>
		<meeting>the 2023 Conference on Empirical Methods in Natural Language Processing<address><addrLine>Singapore</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="1720" to="1736" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Chain-of-thought prompting elicits reasoning in large language models</title>
		<author>
			<persName><forename type="first">J</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schuurmans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bosma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ichter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Xia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">H</forename><surname>Chi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022)</title>
				<meeting>the 36th Conference on Neural Information Processing Systems (NeurIPS 2022)</meeting>
		<imprint>
			<publisher>Brain Team</publisher>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Chain-of-thought reasoning in tabular language models</title>
		<author>
			<persName><forename type="first">M</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Hao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lyu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>She</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2023, Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="11006" to="11019" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts</title>
		<author>
			<persName><forename type="first">T</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Terry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Cai</surname></persName>
		</author>
		<idno type="DOI">10.1145/3491102.3517582</idno>
		<idno>doi:10.1145/3491102.3517582</idno>
		<ptr target="https://doi.org/10.1145/3491102.3517582" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI)</title>
				<meeting>the 2022 CHI Conference on Human Factors in Computing Systems (CHI)<address><addrLine>New Orleans, LA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note>copyright 2022 by the owner/author(s</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Pal: Program-aided language models</title>
		<author>
			<persName><forename type="first">L</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Madaan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Alon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Callan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Neubig</surname></persName>
		</author>
		<ptr target="http://reasonwithpal.com,copyright2023by" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 40th International Conference on Machine Learning (ICML)</title>
				<meeting>the 40th International Conference on Machine Learning (ICML)<address><addrLine>PMLR, Honolulu, Hawaii, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note>the author(s</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Deductive verification of chain-of-thought reasoning</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Ling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Fang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Memisevic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Su</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">37th Conference on Neural Information Processing Systems (NeurIPS 2023)</title>
				<meeting><address><addrLine>NeurIPS</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Self-consistency improves chain of thought reasoning in language models</title>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schuurmans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">H</forename><surname>Chi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Narang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chowdhery</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations (ICLR)</title>
				<imprint>
			<publisher>Brain Team</publisher>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Large language models are zero-shot reasoners</title>
		<author>
			<persName><forename type="first">T</forename><surname>Kojima</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Gu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Reid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Matsuo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Iwasawa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The U 36th Conference on Neural Information Processing Systems</title>
				<meeting><address><addrLine>NeurIPS</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022. 2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Tree of thoughts: Deliberate problem solving with large language models</title>
		<author>
			<persName><forename type="first">S</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Shafran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Griffiths</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Narasimhan</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper_files/paper/2023/file/271db9922b8d1f4dd7aaef84ed5ac703-Paper-Conference.pdf" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Oh</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Neumann</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Globerson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Saenko</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Hardt</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Levine</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="11809" to="11822" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Probabilistic tree-of-thought reasoning for answering knowledge-intensive complex questions</title>
		<author>
			<persName><forename type="first">S</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Lv</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Tian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Hou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP, Association for Computational Linguistics</title>
				<meeting><address><addrLine>Beijing, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="12541" to="12560" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">A brief overview of universal sentence representation methods: A linguistic view</title>
		<author>
			<persName><forename type="first">R</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Moens</surname></persName>
		</author>
		<idno type="DOI">10.1145/3482853</idno>
		<ptr target="https://doi.org/10.1145/3482853.doi:10.1145/3482853" />
	</analytic>
	<monogr>
		<title level="j">ACM Comput. Surv</title>
		<imprint>
			<biblScope unit="volume">55</biblScope>
			<biblScope unit="page">42</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Distributed representations of words and phrases and their compositionality</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">S</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Sentence-bert: Sentence embeddings using siamese bert-networks</title>
		<author>
			<persName><forename type="first">N</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<idno type="DOI">10.18653/V1/D19-1410</idno>
		<ptr target="https://doi.org/10.18653/v1/D19-1410.doi:10.18653/V1/D19-1410" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Inui</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Jiang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Ng</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">X</forename><surname>Wan</surname></persName>
		</editor>
		<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">November 3-7, 2019. 2019</date>
			<biblScope unit="page" from="3980" to="3990" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<title level="m" type="main">Learning to rank for information retrieval and natural language processing</title>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2022">2022</date>
			<publisher>Springer Nature</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Logic and conversation</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">P</forename><surname>Grice</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Speech acts</title>
				<imprint>
			<publisher>Brill</publisher>
			<date type="published" when="1975">1975</date>
			<biblScope unit="page" from="41" to="58" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
