<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Preparing AI for Compliance: Initial Steps of a Framework for Teaching LLMs to Reason About Compliance</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Barbara</forename><surname>Makovec</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Institut &quot;Jožef Stefan&quot;</orgName>
								<address>
									<addrLine>Jamova 39</addrLine>
									<settlement>Ljubljana</settlement>
									<country key="SI">Slovenia</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Faculty of Mathematics and Physics</orgName>
								<orgName type="institution">University of Ljubljana</orgName>
								<address>
									<addrLine>Jadranska 19</addrLine>
									<settlement>Ljubljana</settlement>
									<country key="SI">Slovenia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Luis</forename><surname>Rei</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Institut &quot;Jožef Stefan&quot;</orgName>
								<address>
									<addrLine>Jamova 39</addrLine>
									<settlement>Ljubljana</settlement>
									<country key="SI">Slovenia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Inna</forename><surname>Novalija</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Institut &quot;Jožef Stefan&quot;</orgName>
								<address>
									<addrLine>Jamova 39</addrLine>
									<settlement>Ljubljana</settlement>
									<country key="SI">Slovenia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Preparing AI for Compliance: Initial Steps of a Framework for Teaching LLMs to Reason About Compliance</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">87F9B5884F886791113AF331540D3215</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:27+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Large Language Models (LLMs)</term>
					<term>Regulatory Reasoning</term>
					<term>Retrieval-Augmented Generation (RAG)</term>
					<term>Chain-of-Though</term>
					<term>Text mining</term>
					<term>AI Governance</term>
					<term>Fair Transparent and Trustworthy AI</term>
					<term>Artificial Intelligence (AI) Compliance</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The integration of powerful Large Language Models into diverse applications has been rapid, but it faces significant challenges due to the complexity of global regulatory and ethical frameworks, such as those in the GDPR and the AI act. To address the need for AI systems that can navigate these compliance requirements, we propose a tool designed to create a specialized dataset for training AI assistants in regulatory and ethical reasoning and present its initial implementation. Our approach uses a Retrieval-Augmented Generation (RAG) method that preserves the structure of legal texts, ensuring accurate retrieval and interpretation of relevant provisions. This tool automates the generation of compliance reasoning data by selecting and explaining how specific legal and ethical guidelines impact real-world examples of AI technologies. This is to be followed by a refinement process to ensure only the best candidates are presented to the annotators. We aim to facilitate the development of AI-driven compliance assistants that can effectively align with global legal and ethical standards.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In recent years, we have witnessed the disruptive emergence of powerful Large Language Models, which can be utilized as ready-to-deploy AI services with minimal effort. Their rapid adoption spans from smallscale single-developer projects to critical integrations within Fortune 500 companies. Simultaneously, a plethora of legislations, regulations, ethical guidelines, and policy goals have emerged in the technology and data sectors, such as the GDPR 1 , the Data Governance Act 2 , the Data Act 3 , the Artificial Intelligence Act 4 . The rapid technological advancement, coupled with diverse and evolving regulatory landscapes across different countries, presents significant challenges for developers, data scientists, researchers, regulators, and policymakers. We believe that leveraging Large Language Models (LLMs) to explain, review, and assess AI models, datasets, and complete pipelines from the perspective of legislations, regulations, ethical guidelines, and social impact can help address the challenges. For instance, a data scientist developing a new pipeline could ensure compliance with EU and USA regulations by submitting the pipeline description, along with each dataset and model card, to the compliance assistant. By selecting the relevant jurisdictions, potential issues can be identified early in the development process, facilitating faster progress before a more detailed review by the company's compliance experts.</p><p>Beyond just understanding the law, any general solution will likely require some form of Retrieval-Augmented Generation (RAG) in which the LLM can reason over the specific set of retrieved compliance requirements that can apply to a single product, service, or company at a given point in time within a certain jurisdiction. The first step towards developing a "compliance assistant" is to build datasets RuleML+RR'24: Companion Proceedings of the 8th International Joint Conference on Rules and Reasoning, September 16-22, 2024, Bucharest, Romania * Corresponding author. Envelope makovecbarbara1@gmail.com (B. Makovec); luis.rei@ijs.si (L. Rei); inna.koval@ijs.si (I. Novalija) that can be used to teach and evaluate the assistant in this complex task. Annotating and labeling this data demands the expertise of legal professionals to ensure accuracy, making the process both time-consuming and expensive. To address this challenge, we propose a framework that generates high-quality examples for annotation (Figure <ref type="figure" target="#fig_0">1</ref>). In this paper, we discuss the details of the first part, the initial generation of examples.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Our ultimate goal of creating a compliance assistant is not conceptually unique. For example, Gracenote.ai<ref type="foot" target="#foot_0">5</ref> is an AI-driven platform for regulatory compliance. While the legal AI CoCounsel<ref type="foot" target="#foot_1">6</ref> from Thomson Reuters includes contract compliance features. CuratedAI<ref type="foot" target="#foot_2">7</ref> uses RAG approach to answer legal questions about EU laws and regulations. In research, as overall systems, we highlight DISC-LawLLM which includes a retriever with access to a knowledge base of Chinese laws <ref type="bibr" target="#b0">[1]</ref> and Chatlaw dynamically builds a case-specific Knowledge Graph within a multi-agent system by various methods and answers using a RAG approach <ref type="bibr" target="#b1">[2]</ref>. Several public datasets evaluate LLM assistants' legal reasoning, such as LegalBench <ref type="bibr" target="#b2">[3]</ref> and Contract Understanding Atticus Dataset <ref type="bibr" target="#b3">[4]</ref>. Our goal is slightly different, as we want to do reasoning on compliance of AI tools with variable provisions. Given an LLM that is instructed to reason only on specific retrieved provisions, the user can select which provisions would be considered by selecting those that can be retrieved, e.g. only laws that apply in the EU, plus provisions that apply to the financial sector, plus the user's ethical guidelines. For generating better responses, Chain-of-Thought Prompting enhances LLM reasoning by generating intermediate steps <ref type="bibr" target="#b4">[5]</ref>, and LLMs can perform zero-shot reasoning by adding "Let's think step by step" before answers <ref type="bibr" target="#b5">[6]</ref>. Self-Consistency improves this by sampling diverse reasoning paths and selecting the most consistent answer <ref type="bibr" target="#b6">[7]</ref>. Additionally, LLMs can self-improve by generating and fine-tuning themselves with high-confidence, rationale-augmented answers <ref type="bibr" target="#b7">[8]</ref>. The SELF-DISCOVER framework allows LLMs to self-compose reasoning structures using atomic modules <ref type="bibr" target="#b8">[9]</ref>, and the Self-Instruct framework enhances instruction-following capabilities through self-generated instructions <ref type="bibr" target="#b9">[10]</ref>. In ranking and selecting model responses, the use of strong LLMs as judges to evaluate responses to open-ended questions has become one of the most popular options <ref type="bibr" target="#b10">[11]</ref>. Building on this, using a Panel of LLM evaluators (PoLL) has been proposed to provide a more diverse and balanced evaluation <ref type="bibr" target="#b11">[12]</ref>. The Llama Guard model introduces an LLM-based input-output safeguard for classifying and evaluating responses that can filter out undesirable ones <ref type="bibr" target="#b12">[13]</ref>. Self-Refine introduces an iterative feedback mechanism where an LLM generates an initial output, provides feedback on its own output, and then refines itself based on this feedback <ref type="bibr" target="#b13">[14]</ref>. The utility of LLM critics is demonstrated in the context of code and mathematics evaluation, where LLMs provide natural language feedback that highlights issues in code <ref type="bibr" target="#b14">[15]</ref> or proofs <ref type="bibr" target="#b15">[16]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Data and Methods</head><p>We focus on the candidate generation phase of our framework (as shown in Figure <ref type="figure" target="#fig_0">1</ref>). This process utilizes a RAG approach, starting with the selection of examples from our database, which includes news articles about specific AI technologies or incidents, GitHub README files from AI-related repositories, and Hugging Face model and dataset cards. The next step involves retrieving relevant sections of legal and ethical provisions from our knowledge base, identified through similarity search. These retrieved provisions are then combined, and the language model is prompted to reason and explain how they impact the selected example using a zero-shot Chain of Thought (CoT) prompt <ref type="bibr" target="#b5">[6]</ref>.</p><p>A common limitation of many RAG pipelines is their disregard for the structural integrity of documents, often dividing them into uniform-length chunks. This can lead to critical oversight, especially when dealing with legal documents, which are typically organized into articles and paragraphs. We employ a systematic approach to structuring and querying legal documents for efficient retrieval and compliance analysis, as described in Figure <ref type="figure">3</ref>. The legal document 𝐿 is divided into its pre-defined articles and paragraphs as they are structured in the base document. Each paragraph is further segmented into overlapping passages of fixed length 𝑠 with an overlap 𝑜 to maintain context across segments. Each passage is then encoded using a dense retrieval embedding model. When querying, we embed the query and compute the dot product similarity between the embeddings of the query and the stored passages. We retrieve the top 𝑘 passages with the highest scores. We then look up the articles to which these passages belong and generate a prompt using a predefined template and 𝑛 of these articles. The prompt forms a question asking the LLM to analyze step-by-step <ref type="bibr" target="#b5">[6]</ref> the implications of the provided legislative articles with respect to the query. Encode 𝑔 𝑥𝑦𝑧 using model 𝐸 Construct prompt 𝑀 𝑄 and obtain LLM response 𝑅 𝑄 17: end for In our initial experiments, we used the EU AI Act as our legislative text, and with queries consisting of sentences reporting on AI-related incidents from the news, dataset and model cards, and open-source AI project README files. The retrieval model used was the small BGE [17] model<ref type="foot" target="#foot_3">8</ref> for dense retrieval, while the LLM was . The parameters used were 𝑠 = 184 and 𝑜 = 30, 𝑘 = 10, 𝑡 = 0.3, determined heuristically. We've explored creating queries with both 𝑛 = 1 and 𝑛 = 𝑘, the choice influences how many articles are included in a single query. An example prompt template is shown in Listing 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Listing 1: Example Prompt for Legal Compliance Analysis</head><p>Consider the following articles of legislation, provided between triple backticks, and nothing else: ```\{articles\}``Ù nder these articles and only these articles and ignoring those that are not applicable, as a legal compliance expert, answer: what are the implications of that legislation to the following \{example type\}, provided between triple backticks: ```\{query\}``L et's think step by step.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusions and Future Work</head><p>In this work, we introduced the initial phase of a framework and tool designed to prepare datasets for training Large Language Models (LLMs) to perform compliance reasoning in AI applications. Our approach preserves the critical structure and content of legal provisions within a Retrieval-Augmented Generation (RAG) setting, ensuring more accurate and contextually aware reasoning. Our proposed framework offers significant advantages for companies developing and deploying AI systems across different regulatory landscapes. By integrating a compliance assistant into the AI development process, companies can proactively ensure that their models and data pipelines comply with complex regulations, identify potential legal issues early in the development cycle, and streamline the process by reducing the need for extensive manual reviews by legal experts. As a result, companies can reduce compliance risks, accelerate time-to-market, and maintain high standards of ethical and legal accountability in their AI initiatives.</p><p>Looking ahead, our next steps will focus on the implementation of the refinement loop. Additionally, we plan to explore the tool's potential use by the public and policymakers to raise awareness and deepen understanding of AI technologies and the associated regulatory landscape.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Our framework leveraging RAG and an LLM to generate, judge, criticize, and refine candidate examples.</figDesc><graphic coords="3,72.00,65.60,451.29,175.23" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Algorithm 1 5 :</head><label>15</label><figDesc>Legal Text Indexing and Retrieval Augmented Generation Input: Legal document 𝐿, query 𝑄, embedding model 𝐸, parameters 𝑝, 𝑜, 𝑘, 𝑡, 𝑛 Output: LLM-generated candidate responses based on 𝑄 1: Indexing: 2: Split 𝐿 into articles 𝒜 = {𝐴 1 , … , 𝐴 𝑥 } 3: for each 𝐴 𝑥 in 𝒜 do 4: Split 𝐴 𝑥 into paragraphs 𝑃 𝑦 = {𝑃 𝑥1 , … , 𝑃 𝑥𝑦 } for each 𝑃 𝑥𝑦 in 𝐴 𝑥 do 6: Partition 𝑃 𝑥𝑦 into overlapping passages 𝑔 𝑥𝑦𝑧 of length 𝑝 with overlap 𝑜 7:</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>10 :</head><label>10</label><figDesc>Retrieval Augmented Generation: 11: Encode query 𝑄 using model 𝐸 12: Compute similarity scores between the encoded 𝑄 and each encoded passage 𝑔 𝑥𝑦𝑧 13: Retrieve top 𝑘 passages {𝑔 1 , … , 𝑔 𝑘 } with a similarity score ≥ 𝑡 14: Get the subset of articles 𝒜 𝑢 to which the passages {𝑔 1 , … , 𝑔 𝑘 } belong 15: for each subset of up to 𝑛 articles in 𝒜 𝑢 do 16:</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_0">https://gracenote.ai/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_1">https://casetext.com/cocounsel/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_2">https://www.curatedai.eu/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_3">https://huggingface.co/BAAI/bge-small-en-v1.5</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Acknowledgments</head><p>This work was supported by the European Union through enrichMyData EU HORIZON-IA project under grant agreement No 101070284 and ELIAS HORIZON-RIA project under grant agreement No 101120237.</p></div>
			</div>


			<div type="availability">
<div xmlns="http://www.tei-c.org/ns/1.0"> <ref type="bibr" target="#b1">2</ref> <p>https://digital-strategy.ec.europa.eu/en/policies/data-governance-act 3 https://digital-strategy.ec.europa.eu/en/policies/data-act 4 https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Yue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wei</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2309.11325</idno>
		<title level="m">Disc-lawllm: Fine-tuning large language models for intelligent legal services</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Yuan</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2306.16092</idno>
		<title level="m">Chatlaw: A multi-agent collaborative legal assistant with knowledge graph enhanced mixture-of-experts large language model</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Legalbench: A collaboratively built benchmark for measuring legal reasoning in large language models</title>
		<author>
			<persName><forename type="first">N</forename><surname>Guha</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023</title>
				<meeting><address><addrLine>New Orleans, LA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">December 10 -16, 2023, 2023</date>
			<biblScope unit="page" from="44123" to="44279" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">CUAD: an expert-annotated NLP dataset for legal contract review</title>
		<author>
			<persName><forename type="first">D</forename><surname>Hendrycks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Burns</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ball</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Vanschoren</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Yeung</surname></persName>
		</editor>
		<meeting>the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021<address><addrLine>virtual</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021-12">December 2021. 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Chainof-thought prompting elicits reasoning in large language models</title>
		<author>
			<persName><forename type="first">J</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schuurmans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bosma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ichter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Xia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">H</forename><surname>Chi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Koyejo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Mohamed</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Agarwal</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Belgrave</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Oh</surname></persName>
		</editor>
		<meeting><address><addrLine>New Orleans, LA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022-12-09">November 28 -December 9, 2022, 2022</date>
			<biblScope unit="page" from="24824" to="24837" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Large language models are zero-shot reasoners</title>
		<author>
			<persName><forename type="first">T</forename><surname>Kojima</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Gu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Reid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Matsuo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Iwasawa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022</title>
				<meeting><address><addrLine>New Orleans, LA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022-12-09">November 28 -December 9, 2022, 2022</date>
			<biblScope unit="page" from="22199" to="22213" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Selfconsistency improves chain of thought reasoning in language models</title>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schuurmans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">H</forename><surname>Chi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Narang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chowdhery</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Eleventh International Conference on Learning Representations, ICLR 2023</title>
				<meeting><address><addrLine>Kigali, Rwanda</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">May 1-5, 2023, 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Large language models can self-improve</title>
		<author>
			<persName><forename type="first">J</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Hou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
		<idno type="DOI">10.18653/V1/2023.EMNLP-MAIN.67</idno>
		<idno>EMNLP-MAIN.67</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023</title>
				<meeting>the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023<address><addrLine>Singapore</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">December 6-10, 2023. 2023</date>
			<biblScope unit="page" from="1051" to="1068" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Zhou</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2402.03620</idno>
		<title level="m">Self-discover: Large language models self-compose reasoning structures</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Self-instruct: Aligning language models with self-generated instructions</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kordi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mishra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Khashabi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hajishirzi</surname></persName>
		</author>
		<idno type="DOI">10.18653/V1/2023.ACL-LONG.754</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 61st Annual Meeting of the Association for Computational Linguistics<address><addrLine>Toronto, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">July 9-14, 2023. 2023</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="13484" to="13508" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Judging llm-as-a-judge with mt-bench and chatbot arena</title>
		<author>
			<persName><forename type="first">L</forename><surname>Zheng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023</title>
				<meeting><address><addrLine>New Orleans, LA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">December 10 -16, 2023, 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Replacing judges with juries: Evaluating LLM generations with a panel of diverse models</title>
		<author>
			<persName><forename type="first">P</forename><surname>Verga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hofstätter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Althammer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Piktus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Arkhangorodsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>White</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">S H</forename><surname>Lewis</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2404.18796</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Inan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Upasani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rungta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Iyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Mao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tontchev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Fuller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Testuggine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Khabsa</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2312.06674</idno>
		<title level="m">Llama guard: Llm-based input-output safeguard for human-ai conversations</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Self-refine: Iterative refinement with self-feedback</title>
		<author>
			<persName><forename type="first">A</forename><surname>Madaan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023</title>
				<meeting><address><addrLine>New Orleans, LA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">December 10 -16, 2023, 2023</date>
			<biblScope unit="page" from="46534" to="46594" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Llm critics help catch llm bugs</title>
		<author>
			<persName><forename type="first">N</forename><surname>Mcaleese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">M</forename><surname>Pokorny</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">F C</forename><surname>Uribe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Nitishinskaya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Trebacz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Leike</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2407.00215</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">LLM critics help catch bugs in mathematics: Towards a better mathematical verifier with natural language feedback</title>
		<author>
			<persName><forename type="first">B</forename><surname>Gao</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2406.14024</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Muennighoff</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2309.07597</idno>
		<title level="m">C-pack: Packaged resources to advance general chinese embedding</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
